Gradient Boost for Classification

Photo by SpaceX on Unsplash

Here I would like to go through the steps of the Gradient Boost for Classification. Gradient Boost Classification is very similar to the Gradient Boost Regression algorithm with a few differences:

Target values in binary classification are 0s and 1s
Log Loss function(similar to logistic regression problems).
Using Log of odds and probabilities based on the Log Loss Function. So we can apply the same mathematical algorithm, we used in the previous post taking into account the above differences.

Below are the steps of Gradient Boost Classification algorithm when used with Logistic Loss Function L(y,F(x)).
I also tried to avoid mathematical notations and simplified all the steps.

1) Get the initial prediction logarithms of odds and probability of class = 1, so basically we count numbers of class 1s and class 0s and calculate P(class=1) and log(odds=1)=log(P/(1-P)).
Example: if we have balanced data we will have equal (or similar) numbers of 0s and 1s, so initially:
Predicted_Probability(class_1)=0.5
log(odds_1) = 0.

2) m is the number of weak learners. So we do the below steps for each decision tree (e.g. m=1 to m=100, when n_estimators=100):

a. Compute residuals (True-Predicted_Probability) for each tree iteratively (meaning previous residuals used as a target for the next decision tree).

Example: for the first tree, Residual = True - 0.5 (Predicted in the previous step) and True = 0 or 1 (per Target class)

b. Fit decision tree to the residuals

c. Compute the output value for each leaf in the tree. We cannot take simply an average of all the values in the leaf as we did in regression. Here we will use the following formula:
predicted_leaf_output = (sum of residuals in the leaf) / [sum of the (Predicted_Probability*(1-Predicted_Probability)]

d. First, update the predicted log of odds for each row of data:
log(odds) = Previously_predicted_log(odds) + leraning_rate * predicted_leaf_output

e. Then, calculate the Probability for each row of data using log(odds):
P= odds/(1+odds) or P = exp(log(odds))/[1+exp(log(odds)].

3) Compute the final prediction F(x).

Blog

Gradient Boost for Classification

Abzal Seitkaziyev

Join Our Newsletter. No Spam, Only the good stuff.

Related