Automated boosting round selection using early_stopping
Now, instead of attempting to cherry pick the best possible number of boosting rounds, you can very easily have XGBoost automatically select the number of boosting rounds for you within xgb.cv()
. This is done using a technique called early stopping.
Early stopping works by testing the XGBoost model after every boosting round against a hold-out dataset and stopping the creation of additional boosting rounds (thereby finishing training of the model early) if the hold-out metric ("rmse"
in our case) does not improve for a given number of rounds. Here you will use the early_stopping_rounds
parameter in xgb.cv()
with a large possible number of boosting rounds (50). Bear in mind that if the holdout metric continuously improves up through when num_boost_rounds
is reached, then early stopping does not occur.
Here, the DMatrix
and parameter dictionary have been created for you. Your task is to use cross-validation with early stopping. Go for it!
Cet exercice fait partie du cours
Extreme Gradient Boosting with XGBoost
Instructions
- Perform 3-fold cross-validation with early stopping and
"rmse"
as your metric. Use10
early stopping rounds and50
boosting rounds. Specify aseed
of123
and make sure the output is apandas
DataFrame. Remember to specify the other parameters such asdtrain
,params
, andmetrics
. - Print
cv_results
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Create your housing DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)
# Create the parameter dictionary for each tree: params
params = {"objective":"reg:squarederror", "max_depth":4}
# Perform cross-validation with early stopping: cv_results
cv_results = ____
# Print cv_results
print(____)