Get startedGet started for free

Gradient boosting machines

1. Gradient boosting machines

In this lesson, you will learn about gradient boosting machines, and how to fit gradient boosting models in R with the xgboost package.

2. How Gradient Boosting Works

Gradient boosting is an ensemble method that builds up a model by incrementally improving the existing one. Start the model by fitting a single, usually shallow, tree to the data. This is M_1.

3. How Gradient Boosting Works

Next, fit a tree to the residuals of the model and find the weighted sum of that tree with the first one that gives the best fit. This is M_2.

4. How Gradient Boosting Works

For regularized boosting, decrease the learning by a factor eta, between 0 and 1. Eta close to 1 will give faster learning, but increases the risk of overfit. Smaller eta slows the learning, but lessens the risk of overfit.

5. How Gradient Boosting Works

Repeat until either the residuals are small enough, or the maximum number of iterations is reached.

6. Cross-validation to Guard Against Overfit

Because gradient boosting optimizes error on the training data, it's very easy to overfit the model. So best practice is to estimate out-of-sample error via cross-validation for each incremental model, then retroactively decide how many trees to use.

7. Best Practice (with xgboost())

In xgboost, the xgb-dot-cv function fits a model and calculates the cross-validated errors. Run this function with a large number of rounds.

8. Best Practice (with xgboost())

xgb-dot-cv records the estimated errors in the evaluation_log element of its output. You can use the evaluation_log to find the number of trees with the lowest estimated RMSE.

9. Best Practice (with xgboost())

Now run the xgboost function with the right number of trees to get the final model.

10. Example: Bike Rental Model

Let's work an example with the January and February bike rental data. Since xgboost can't work directly with categorical variables, we use vtreat to prepare the data.

11. Training a model with xgboost() / xgb.cv()

First run xgb-dot-cv, which takes a large number of arguments. Here we set the number of rounds to 100, with a tree depth of 6 and a learning rate eta of 0.3. We specify 5-fold cross validation for estimating out-of-sample error.

12. Find the Right Number of Trees

The out-of-sample error estimates are in the column test_rmse_mean of the evaluation log. The index of the minimum value corresponds to the best number of trees. For this example, the best model had 78 trees.

13. Run xgboost() for final model

Now run xgboost with the right number of trees to get the final model.

14. Predict with an xgboost() model

The predict function for xgboost models takes a model and the input data. Again, we have to prepare the test data using vtreat. Our gradient boosting model has an RMSE of 54, a notable improvement over the quasipoisson and random forest models we fit earlier.

15. Visualize the Results

We can compare the February predictions to the actual February hourly bike rentals as a scatterplot, and plotted against time.

16. Let's practice!

Now let's practice using xgboost to build and run gradient boosting models.