Grid search with XGBoost

Now that you've learned how to tune parameters individually with XGBoost, let's take your parameter tuning to the next level by using scikit-learn's GridSearch and RandomizedSearch capabilities with internal cross-validation using the GridSearchCV and RandomizedSearchCV functions. You will use these to find the best model exhaustively from a collection of possible parameter values across multiple parameters simultaneously. Let's get to work, starting with GridSearchCV!

This is a part of the course

“Extreme Gradient Boosting with XGBoost”

View Course

Exercise instructions

Create a parameter grid called gbm_param_grid that contains a list of "colsample_bytree" values (0.3, 0.7), a list with a single value for "n_estimators" (50), and a list of 2 "max_depth" (2, 5) values.
Instantiate an XGBRegressor object called gbm.
Create a GridSearchCV object called grid_mse, passing in: the parameter grid to param_grid, the XGBRegressor to estimator, "neg_mean_squared_error" to scoring, and 4 to cv. Also specify verbose=1 so you can better understand the output.
Fit the GridSearchCV object to X and y.
Print the best parameter values and lowest RMSE, using the .best_params_ and .best_score_ attributes, respectively, of grid_mse.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create the parameter grid: gbm_param_grid
gbm_param_grid = {
    '____': [____, ____],
    '____': [____],
    '____': [____, ____]
}

# Instantiate the regressor: gbm
gbm = ____

# Perform grid search: grid_mse
grid_mse = ____


# Fit grid_mse to the data
____

# Print the best parameters and lowest RMSE
print("Best parameters found: ", ____)
print("Lowest RMSE found: ", np.sqrt(np.abs(____)))

Edit and Run Code