Tuning the number of boosting rounds
Let's start with parameter tuning by seeing how the number of boosting rounds (number of trees you build) impacts the out-of-sample performance of your XGBoost model. You'll use xgb.cv()
inside a for
loop and build one model per num_boost_round
parameter.
Here, you'll continue working with the Ames housing dataset. The features are available in the array X
, and the target vector is contained in y
.
This is a part of the course
“Extreme Gradient Boosting with XGBoost”
Exercise instructions
- Create a
DMatrix
calledhousing_dmatrix
fromX
andy
. - Create a parameter dictionary called
params
, passing in the appropriate"objective"
("reg:squarederror"
) and"max_depth"
(set it to3
). - Iterate over
num_rounds
inside afor
loop and perform 3-fold cross-validation. In each iteration of the loop, pass in the current number of boosting rounds (curr_num_rounds
) toxgb.cv()
as the argument tonum_boost_round
. - Append the final boosting round RMSE for each cross-validated XGBoost model to the
final_rmse_per_round
list. num_rounds
andfinal_rmse_per_round
have been zipped and converted into a DataFrame so you can easily see how the model performs with each boosting round. Hit 'Submit Answer' to see the results!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create the DMatrix: housing_dmatrix
housing_dmatrix = ____
# Create the parameter dictionary for each tree: params
params = {"____":"____", "____":____}
# Create list of number of boosting rounds
num_rounds = [5, 10, 15]
# Empty list to store final round rmse per XGBoost model
final_rmse_per_round = []
# Iterate over num_rounds and build one model per num_boost_round parameter
for curr_num_rounds in num_rounds:
# Perform cross-validation: cv_results
cv_results = ____(dtrain=____, params=____, nfold=3, num_boost_round=____, metrics="rmse", as_pandas=True, seed=123)
# Append final round RMSE
____.____(cv_results["test-rmse-mean"].tail().values[-1])
# Print the resultant DataFrame
num_rounds_rmses = list(zip(num_rounds, final_rmse_per_round))
print(pd.DataFrame(num_rounds_rmses,columns=["num_boosting_rounds","rmse"]))