

{"id":8763,"date":"2018-02-20T12:19:02","date_gmt":"2018-02-20T06:49:02","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=8763"},"modified":"2025-07-28T15:28:18","modified_gmt":"2025-07-28T09:58:18","slug":"gradient-boosting-algorithm","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/","title":{"rendered":"Gradient Boosting Algorithm &#8211; Working and Improvements"},"content":{"rendered":"<div class='__iawmlf-post-loop-links' style='display:none;' data-iawmlf-post-links='[{&quot;id&quot;:1418,&quot;href&quot;:&quot;https:\\\/\\\/en.wikipedia.org\\\/wiki\\\/Machine_learning&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251130072921\\\/https:\\\/\\\/en.wikipedia.org\\\/wiki\\\/Machine_learning&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-09 06:41:40&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-12 07:53:29&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-15 08:47:56&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-18 09:10:23&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-21 10:50:59&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-25 05:53:38&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-28 07:50:19&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-31 09:45:43&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-03 11:19:19&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-06 11:20:39&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-09 12:30:49&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-12 13:21:45&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-15 16:42:09&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-19 02:06:11&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-22 05:26:32&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-25 06:14:05&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-28 06:53:55&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-31 08:50:06&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-03 10:51:09&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-06 14:16:56&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-09 16:05:14&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-12 16:40:11&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-15 19:03:03&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-18 20:34:15&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-22 03:25:21&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-25 03:30:28&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-28 04:50:31&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-03 06:16:17&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-06 08:37:19&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-09 10:56:44&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-12 12:39:43&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-15 14:25:10&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-18 16:42:15&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-22 12:10:47&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-25 15:25:55&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-28 22:44:26&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-01 08:22:35&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-04 10:30:37&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-07 10:41:53&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-10 19:54:58&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-14 02:25:23&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-17 05:49:08&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-20 06:38:49&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-23 08:15:10&quot;,&quot;http_code&quot;:404},{&quot;date&quot;:&quot;2026-04-26 10:02:48&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-29 14:13:55&quot;,&quot;http_code&quot;:429},{&quot;date&quot;:&quot;2026-05-02 19:39:01&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-06 04:50:24&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-09 06:14:41&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-12 08:20:37&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-15 09:29:22&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-18 11:00:28&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-21 13:05:16&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-24 13:13:21&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-27 13:51:19&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-30 15:11:03&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-02 18:48:44&quot;,&quot;http_code&quot;:404},{&quot;date&quot;:&quot;2026-06-06 01:41:18&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-09 05:37:43&quot;,&quot;http_code&quot;:404}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-06-09 05:37:43&quot;,&quot;http_code&quot;:404},&quot;process&quot;:&quot;done&quot;}]'><\/div>\n<div>\n<div class=\"\">\n<p>In this Machine Learning Tutorial, we will study Gradient Boosting Algorithm. Also, we will learn Boosting Algorithm history &amp; purpose. Along with this, we will also study the working of Gradient Boosting Algorithm, at last, we will discuss improvements to Gradient Boosting Algorithm.<\/p>\n<h3>What is Gradient Boosting in Machine Learning?<\/h3>\n<\/div>\n<\/div>\n<div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Gradient boosting is a <strong><a href=\"https:\/\/data-flair.training\/blogs\/machine-learning-tutorial\/\">machine learning<\/a> <\/strong>technique for regression and classification problems. That produces a prediction model in the form of an ensemble of weak prediction models.<\/div>\n<\/div>\n<div class=\"\">\n<div><\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">The accuracy of a predictive model can <span class=\"passivevoice\">be boosted<\/span> in two ways:<\/div>\n<\/div>\n<div class=\"\">\n<div><\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">a. Either by embracing feature engineering or<\/div>\n<\/div>\n<div class=\"\">\n<div><\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">b. By applying boosting algorithms straight away.<\/div>\n<\/div>\n<div class=\"\">\n<div><\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">There are many boosting algorithms like<\/div>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Gradient Boosting,<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">XGBoost,<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">AdaBoost,<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Gentle Boost etc.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Every boosting algorithm has its own underlying mathematics. Also, a slight variation <span class=\"passivevoice\">is observed<\/span> while applying them.<\/div>\n<\/div>\n<div class=\"\">\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">History of Boosting Algorithm<\/h3>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Boosting Algorithm is one of the most powerful learning ideas introduced in the last twenty years. It was designed for classification problems, but it can <span class=\"passivevoice\">be extended<\/span> to regression as well. The motivation for Gradient\u00a0boosting was a procedure. That combines the outputs of many &#8220;weak&#8221; classifiers to produce a powerful &#8220;committee.&#8221; A weak classifier (e.g. decision tree) is one whose error rate is only better than random guessing.<\/div>\n<\/div>\n<div class=\"\">\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Purpose of Boosting Algorithm<\/h3>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">The purpose of boosting Algorithm is to sequentially apply the weak classification algorithm to repeatedly modified versions of the data, thereby producing a sequence of weak classifiers $G_m(x)$, $m = 1, 2, &#8230; , M$.<\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\"><\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\"><a href=\"https:\/\/data-flair.training\/blogs\/machine-learning-algorithm\/\"><strong>Read about Machine Learning\u00a0Algorithms<\/strong><\/a><\/div>\n<\/div>\n<div class=\"\">\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Stagewise Additive Modeling<\/h3>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Boosting builds an additive model:<\/div>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">$$F(x) = \\sum_{m=1}^M \\beta_m b(x; \\gamma_m)$$<\/div>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">where $b(x; \\gamma_m)$ is a tree and $\\gamma_m$ parameterizes the splits. With boosting, the parameters, $(\\beta_m, \\gamma_m)$ fit in a stage-wise fashion. This slows the process down and overfits less <span class=\"adverb\">quickly<\/span>.<\/div>\n<\/div>\n<div class=\"\">\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">AdaBoost<\/h3>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">AdaBoost builds an additive logistic regression model by stagewise fitting<\/div>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">IN AdaBoost, we use an exponential loss function of the form-<\/div>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">$L(y, F(x)) = exp(-yF(x))$, like the negative binomial log-likelihood loss.<\/div>\n<\/div>\n<div class=\"\">\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Gradient Boosting Algorithm<\/h3>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Friedman&#8217;s Gradient Boosting Algorithm for a generic loss function, $L(y_i, \\gamma)$:<\/div>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Source: Elements of Statistical Learning<\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">\n<div id=\"attachment_8864\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Gradient-Boosting-Algorithm-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8864\" class=\"wp-image-8864 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Gradient-Boosting-Algorithm-01.jpg\" alt=\"What is Gradient Boosting Algorithm\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Gradient-Boosting-Algorithm-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Gradient-Boosting-Algorithm-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Gradient-Boosting-Algorithm-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Gradient-Boosting-Algorithm-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Gradient-Boosting-Algorithm-01-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-8864\" class=\"wp-caption-text\">What is Gradient Boosting Algorithm<\/p><\/div>\n<\/div>\n<\/div>\n<div class=\"\">\n<h4 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">a. Loss Functions and Gradients<\/h4>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Source: Elements of Statistical Learning<\/div>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">The optimal number of iterations, T, and the learning rate, \u03bb, depend on each other.<\/div>\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\"><a href=\"https:\/\/data-flair.training\/blogs\/machine-learning-applications\/\"><strong>Read about Machine Learning Applications in the real world<\/strong><\/a><\/div>\n<\/div>\n<\/div>\n<div>\n<div class=\"\">\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Stochastic Gradient Boosting Algorithm<\/h3>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">This <span class=\"passivevoice\">was proposed<\/span> by the stochastic gradient boosting algorithm. Also, it samples without replacement from the dataset before estimating the next step. He found that this <span class=\"complexword\">additional<\/span> step improved performance.<\/div>\n<\/div>\n<div class=\"\">\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">How is the Gradient Boosting Algorithm Works?<\/h3>\n<p class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Gradient boosting Algorithm involves three elements:<\/p>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">A loss function to <span class=\"passivevoice\">be optimized<\/span>.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Weak learner to make predictions.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">An additive model to add weak learners to <span class=\"complexword\">minimize<\/span> the loss function.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<h4 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">a. Loss Function<\/h4>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">The loss function used depends on the type of problem <span class=\"passivevoice\">being solved<\/span>.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">It must be differentiable. Although, many standard loss functions <span class=\"passivevoice\">are supported<\/span> and you can define your own.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<h4 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">b. Weak Learner<\/h4>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">We use decision trees as the weak learner in gradient boosting algorithm.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\"><span class=\"adverb\">Specifically<\/span>, we use regression tree that output real values for splits. And whose output can <span class=\"passivevoice\">be added<\/span> together. It allows next models outputs to <span class=\"passivevoice\">be added<\/span> and \u201ccorrect\u201d the residuals in the predictions.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Trees need to<span class=\"passivevoice\"> construct <\/span>in a greedy manner. It helps in choosing the best split points based on purity scores like Gini or to cut the loss.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\"><span class=\"adverb\">Initially<\/span>, such as in the case of AdaBoost. Also, we use very short decision trees\u00a0that only had a single split, called a decision stump.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Generally, we use larger trees with 4-to-8 levels.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">It is common to constrain the weak learners in specific ways. Such as a <span class=\"complexword\">maximum<\/span> number of layers, nodes, splits or leaf nodes.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">This is to ensure that the learners remain weak, but can still need to construct in a greedy manner.<\/li>\n<\/ul>\n<p>Furthermore, every tree built with a big size can accommodate more complicated details of the data and make the model work well in the subsequent application. However, this also implies that the model is subjected to over-fitting and sometimes cross-validation is needed in order to test the right tree size and number of iterations.<\/p>\n<\/div>\n<div class=\"\">\n<h4 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">c. Additive Model<\/h4>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Trees <span class=\"passivevoice\">need to add<\/span> one at a time, and existing trees in the model need not change.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">We use a gradient descent procedure to <span class=\"complexword\">minimize<\/span> the loss when adding trees.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\"><span class=\"adverb\">Traditionally<\/span>, we use gradient tree to cut a set of parameters. Such as the coefficients in a regression equation or weights in a neural network. After calculating error or loss, the weights <span class=\"passivevoice\">need to be update<\/span> to <span class=\"complexword\">minimize<\/span> that error.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Instead of parameters, we have weak learner sub-models or more <span class=\"adverb\">specifically<\/span> decision trees. After calculating the loss, to perform the gradient descent procedure. We must add a tree to the model that reduces the loss.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">We do this by parameterizing the tree. Then change the parameters of the tree and move in the right direction by (reducing the residual loss.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<p class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\" style=\"text-align: left\"><a href=\"https:\/\/data-flair.training\/blogs\/advantages-and-disadvantages-of-machine-learning\/\"><strong>Learn about Pros And Cons of Machine Learning<\/strong><\/a><\/p>\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Improvements to Basic Gradient Boosting Algorithm<\/h3>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Gradient boosting algorithm is a greedy algorithm and can overfit a training dataset <span class=\"adverb\">quickly<\/span>.<\/div>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">It can enjoy regularization methods. That penalize various parts of boosting algorithm. And generally improve the performance of the algorithm by reducing overfitting.<\/div>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Tree Constraints<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Shrinkage<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Random sampling<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Penalized Learning<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<h4 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">a. Tree Constraints<\/h4>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">It is important that the weak learners have skill but remain weak.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">There are many ways that the trees need to <span class=\"passivevoice\">be a constraint<\/span>.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<h4 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">b. Weighted Updates<\/h4>\n<\/div>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">The predictions of each tree <span class=\"passivevoice\">have to add<\/span> together <span class=\"adverb\">sequentially<\/span>.<\/li>\n<\/ul>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">The contribution of each tree to this sum needs to <span class=\"passivevoice\">be weight<\/span> to slow down the learning by the algorithm. This weighting <span class=\"passivevoice\">is referred as<\/span> a shrinkage or a learning rate.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<h4 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">c. Stochastic Gradient Boosting algorithm<\/h4>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">A big insight into bagging ensembles. Also, the random forest was allowing trees to <span class=\"passivevoice\">create.\u00a0<\/span><\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">This same benefit can <span class=\"passivevoice\">be used<\/span> to reduce the correlation between the trees.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">This variation of boosting <span class=\"passivevoice\">is referred as<\/span> stochastic gradient boosting.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<h4 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">d. Penalized Gradient Boosting algorithm<\/h4>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\"><span class=\"complexword\">We can impose additional<\/span> constraints on the parameterized trees.<\/div>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">We can&#8217;t use classical decision tree as weak learners. Instead, a modified form called a regression tree <span class=\"passivevoice\">is used<\/span> that has numeric values in the leaf nodes. The values in the leaves of the trees can <span class=\"passivevoice\">be called<\/span> weights in some literature.<\/div>\n<\/div>\n<div class=\"\">\n<div class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">As such, the leaf weight values of the trees have to <span class=\"passivevoice\">regularize<\/span>. For this, we use popular regularization functions, such as:<\/div>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">L1 regularization of weights.<\/li>\n<\/ul>\n<\/div>\n<div class=\"\">\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">L2 regularization of weights.<\/li>\n<\/ul>\n<p>Also, L1 and L2 norms are used in regularization so that the coefficients are not large and hence there be no overfitting. This is especially important when applying the algorithm to high-dimensional data since there is higher possibility of over-fitting.<\/p>\n<\/div>\n<div class=\"\">\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Conclusion<\/h3>\n<p>Gradient Boosting is a machine learning technique used for building strong models by combining many weak ones. It works by creating decision trees one at a time. Each new tree tries to fix the mistakes made by the trees before it. This step-by-step correction process helps the model learn better and give more accurate results. It\u2019s especially good for handling messy or complicated data that doesn\u2019t follow clear patterns.<\/p>\n<p>Furthermore, if you have any queries, feel free to ask in the comment section.<br \/>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Machine_learning\"><strong>For reference<\/strong><\/a><\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In this Machine Learning Tutorial, we will study Gradient Boosting Algorithm. Also, we will learn Boosting Algorithm history &amp; purpose. Along with this, we will also study the working of Gradient Boosting Algorithm, at&#46;&#46;&#46;<\/p>\n","protected":false},"author":5,"featured_media":8862,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36],"tags":[238,2163,16477,5125,5658,8417,9456,10248,11309],"class_list":["post-8763","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-adaboost","tag-boosting-algorithm","tag-gradient-algorithm-working","tag-gradient-boosting-from-scratch","tag-history-of-boosting-algorithm","tag-loss-function","tag-penalized-learning","tag-purpose-of-boosting-algorithm","tag-random-sampling"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Gradient Boosting Algorithm - Working and Improvements - DataFlair<\/title>\n<meta name=\"description\" content=\"What is Gradient Boosting Algorithm- Improvements &amp; working on Gradient Boosting Algorithm, Tree Constraints, Shrinkage, Random sampling etc.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Gradient Boosting Algorithm - Working and Improvements - DataFlair\" \/>\n<meta property=\"og:description\" content=\"What is Gradient Boosting Algorithm- Improvements &amp; working on Gradient Boosting Algorithm, Tree Constraints, Shrinkage, Random sampling etc.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-02-20T06:49:02+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-07-28T09:58:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Boosting-Algorithm-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Gradient Boosting Algorithm - Working and Improvements - DataFlair","description":"What is Gradient Boosting Algorithm- Improvements & working on Gradient Boosting Algorithm, Tree Constraints, Shrinkage, Random sampling etc.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/","og_locale":"en_US","og_type":"article","og_title":"Gradient Boosting Algorithm - Working and Improvements - DataFlair","og_description":"What is Gradient Boosting Algorithm- Improvements & working on Gradient Boosting Algorithm, Tree Constraints, Shrinkage, Random sampling etc.","og_url":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-02-20T06:49:02+00:00","article_modified_time":"2025-07-28T09:58:18+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Boosting-Algorithm-01.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/7f83c342f5d1632d6f7b4b0b0f447823"},"headline":"Gradient Boosting Algorithm &#8211; Working and Improvements","datePublished":"2018-02-20T06:49:02+00:00","dateModified":"2025-07-28T09:58:18+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/"},"wordCount":1237,"commentCount":1,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Boosting-Algorithm-01.jpg","keywords":["AdaBoost","Boosting Algorithm","Gradient Algorithm Working","Gradient boosting from scratch","history of boosting algorithm","Loss Function","Penalized Learning","Purpose of boosting algorithm","Random sampling"],"articleSection":["Machine Learning Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/","url":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/","name":"Gradient Boosting Algorithm - Working and Improvements - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Boosting-Algorithm-01.jpg","datePublished":"2018-02-20T06:49:02+00:00","dateModified":"2025-07-28T09:58:18+00:00","description":"What is Gradient Boosting Algorithm- Improvements & working on Gradient Boosting Algorithm, Tree Constraints, Shrinkage, Random sampling etc.","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Boosting-Algorithm-01.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Boosting-Algorithm-01.jpg","width":1200,"height":628,"caption":"What is Gradient Boosting Algorithm"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/gradient-boosting-algorithm\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Machine Learning Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/machine-learning\/"},{"@type":"ListItem","position":3,"name":"Gradient Boosting Algorithm &#8211; Working and Improvements"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/7f83c342f5d1632d6f7b4b0b0f447823","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team creates expert-level guides on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our goal is to empower learners with easy-to-understand content. Explore our resources for career growth and practical learning.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam1\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/8763","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=8763"}],"version-history":[{"count":7,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/8763\/revisions"}],"predecessor-version":[{"id":146265,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/8763\/revisions\/146265"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/8862"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=8763"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=8763"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=8763"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}