Gradient Boost Algorithm - AI & ML Insights

Gradient Boosting

Gradient Boosting is an ensemble learning method that builds models sequentially, where each new model corrects the errors of the previous ones using gradient descent. It minimizes the loss function by learning from the residuals (errors) of the previous models.

Step-by-Step Process of Gradient Boosting

Step 1: Initialize with a Weak Model

Start with a simple prediction, often the mean (for regression) or log-odds (for classification).
Example: If we are predicting house prices, we begin with the average price of all houses.

Step 2: Compute Residuals (Errors)

Calculate how much each prediction differs from the actual value:

Step 3: Train a Decision Tree on the Residuals

Fit a small decision tree (weak learner) to predict these residuals.
This tree learns how to correct the previous prediction.

Step 4: Compute the New Prediction

Update the prediction using a learning rate ():
- is the initial prediction.
- is a small value (0.1, 0.01, etc.) to control learning.
- s the weak learner trained on residuals.

Step 5: Repeat the Process

Compute new residuals based on and fit another weak learner to correct errors further.
The process continues for multiple iterations:
- Each step reduces the error by adding another tree that focuses on the residuals.

Step 6: Final Prediction

The final model is the sum of all weak models:
Each weak learner contributes slightly to the final prediction, gradually improving accuracy.

Example Problem: Predicting House Prices

We have the following dataset:

House Size (sq ft)	House Price (Y)
1000	150
1200	200
1500	250
1800	300
2000	350

Our goal is to predict house prices based on size using Gradient Boosting.

Step 1: Initial Prediction (Mean)

The first prediction is the mean of the target variable:

$= 250$

So, our initial prediction for all houses is 250.

House Size	Actual Price (Y)	Initial Prediction (F₀)	Residual (Y – F₀)
1000	150	250	150 – 250 = -100
1200	200	250	200 – 250 = -50
1500	250	250	250 – 250 = 0
1800	300	250	300 – 250 = 50
2000	350	250	350 – 250 = 100

Step 2: Train a Weak Learner (Decision Tree) on Residuals

A small decision tree is trained to predict residuals.

Example Tree Splits:

If House Size ≤ 1200, predict -75 (average of -100 and -50).
If House Size > 1200, predict 50 (average of 0, 50, and 100).

House Size	Residual (Y – F₀)	Tree Prediction
1000	-100	-75
1200	-50	-75
1500	0	50
1800	50	50
2000	100	50

Step 3: Update Predictions with Learning Rate

Gradient Boosting updates the predictions using:

$OutputF_1$

where η (learning rate) = 0.1.

House Size	Initial Prediction (250)	Tree Output	Updated Prediction (F₁)
1000	250	-75	250 + (0.1 × -75) = 242.5
1200	250	-75	242.5
1500	250	50	250 + (0.1 × 50) = 255
1800	250	50	255
2000	250	50	255

Step 4: Compute New Residuals

Now we compute new residuals:

House Size	Actual Price (Y)	Updated Prediction (F₁)	New Residual (Y – F₁)
1000	150	242.5	150 – 242.5 = -92.5
1200	200	242.5	200 – 242.5 = -42.5
1500	250	255	250 – 255 = -5
1800	300	255	300 – 255 = 45
2000	350	255	350 – 255 = 95

Step 5: Train Another Weak Learner on New Residuals

A new decision tree is trained to predict the new residuals.