Automated boosting round selection using early_stopping
Now, instead of attempting to cherry pick the best possible number of boosting rounds, you can very easily have XGBoost automatically select the number of boosting rounds for you within xgb.cv()
. This is done using a technique called early stopping.
Early stopping works by testing the XGBoost model after every boosting round against a hold-out dataset and stopping the creation of additional boosting rounds (thereby finishing training of the model early) if the hold-out metric ("rmse"
in our case) does not improve for a given number of rounds. Here you will use the early_stopping_rounds
parameter in xgb.cv()
with a large possible number of boosting rounds (50). Bear in mind that if the holdout metric continuously improves up through when num_boost_rounds
is reached, then early stopping does not occur.
Here, the DMatrix
and parameter dictionary have been created for you. Your task is to use cross-validation with early stopping. Go for it!
This exercise is part of the course
Extreme Gradient Boosting with XGBoost
Exercise instructions
- Perform 3-fold cross-validation with early stopping and
"rmse"
as your metric. Use10
early stopping rounds and50
boosting rounds. Specify aseed
of123
and make sure the output is apandas
DataFrame. Remember to specify the other parameters such asdtrain
,params
, andmetrics
. - Print
cv_results
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create your housing DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)
# Create the parameter dictionary for each tree: params
params = {"objective":"reg:squarederror", "max_depth":4}
# Perform cross-validation with early stopping: cv_results
cv_results = ____
# Print cv_results
print(____)