Get Started

Tuning the number of boosting rounds

Let's start with parameter tuning by seeing how the number of boosting rounds (number of trees you build) impacts the out-of-sample performance of your XGBoost model. You'll use xgb.cv() inside a for loop and build one model per num_boost_round parameter.

Here, you'll continue working with the Ames housing dataset. The features are available in the array X, and the target vector is contained in y.

This is a part of the course

“Extreme Gradient Boosting with XGBoost”

View Course

Exercise instructions

  • Create a DMatrix called housing_dmatrix from X and y.
  • Create a parameter dictionary called params, passing in the appropriate "objective" ("reg:squarederror") and "max_depth" (set it to 3).
  • Iterate over num_rounds inside a for loop and perform 3-fold cross-validation. In each iteration of the loop, pass in the current number of boosting rounds (curr_num_rounds) to xgb.cv() as the argument to num_boost_round.
  • Append the final boosting round RMSE for each cross-validated XGBoost model to the final_rmse_per_round list.
  • num_rounds and final_rmse_per_round have been zipped and converted into a DataFrame so you can easily see how the model performs with each boosting round. Hit 'Submit Answer' to see the results!

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create the DMatrix: housing_dmatrix
housing_dmatrix = ____

# Create the parameter dictionary for each tree: params 
params = {"____":"____", "____":____}

# Create list of number of boosting rounds
num_rounds = [5, 10, 15]

# Empty list to store final round rmse per XGBoost model
final_rmse_per_round = []

# Iterate over num_rounds and build one model per num_boost_round parameter
for curr_num_rounds in num_rounds:

    # Perform cross-validation: cv_results
    cv_results = ____(dtrain=____, params=____, nfold=3, num_boost_round=____, metrics="rmse", as_pandas=True, seed=123)
    
    # Append final round RMSE
    ____.____(cv_results["test-rmse-mean"].tail().values[-1])

# Print the resultant DataFrame
num_rounds_rmses = list(zip(num_rounds, final_rmse_per_round))
print(pd.DataFrame(num_rounds_rmses,columns=["num_boosting_rounds","rmse"]))

This exercise is part of the course

Extreme Gradient Boosting with XGBoost

IntermediateSkill Level
4.2+
33 reviews

Learn the fundamentals of gradient boosting and build state-of-the-art machine learning models using XGBoost to solve classification and regression problems.

This chapter will teach you how to make your XGBoost models as performant as possible. You'll learn about the variety of parameters that can be adjusted to alter the behavior of XGBoost and how to tune them efficiently so that you can supercharge the performance of your models.

Exercise 1: Why tune your model?Exercise 2: When is tuning your model a bad idea?Exercise 3: Tuning the number of boosting rounds
Exercise 4: Automated boosting round selection using early_stoppingExercise 5: Overview of XGBoost's hyperparametersExercise 6: Tuning etaExercise 7: Tuning max_depthExercise 8: Tuning colsample_bytreeExercise 9: Review of grid search and random searchExercise 10: Grid search with XGBoostExercise 11: Random search with XGBoostExercise 12: Limits of grid search and random searchExercise 13: When should you use grid search and random search?

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free