Gradient Boosting Regressor Example

Photo by Michael Dziedzic on Unsplash

In the previous post, I briefly explained Gradient Boosting using a classification problem. Here I will do step by step explanation of how Gradient Boosting Regressor works using sklearn and Python to complement a theory given here. I did this exercise mainly to build an intuition of processes inside the Gradient boosted trees and by doing so to avoid using it as some sort of 'black box' algorithm.

I used a dataset with car prices (source) for this purpose. So, for easy tracking of the processes inside the Gradient boosted trees, I used a small portion of the data with a minimum number of the trees(m=2), and the depth of a tree(max_depth=2).

1) First, we initialize the model, by getting initial predictions Pred_0. It is calculated as Mean value of the prices in the train dataset. Then we calculated initials residuals: Res_0 = train['price']-Pred_0. See below.

2) Here we fit all data points(= each row features and Res_0) into the first tree. This tree build by using 'MSE' as a criterion.

Each Value in the Leaf are calculated by the mean values of the residuals in each leaf. Then Prediction is calculated:
Pred_1 = Pred_0 + learning_rate*output_value_1

The we calculate residuals:
Res_1 = train['price']-Pred_1

Node #2, 3, 5, and 6 Predictions and Residuals: