Grid search with XGBoost
Now that you've learned how to tune parameters individually with XGBoost, let's take your parameter tuning to the next level by using scikit-learn's GridSearch
and RandomizedSearch
capabilities with internal cross-validation using the GridSearchCV
and RandomizedSearchCV
functions. You will use these to find the best model exhaustively from a collection of possible parameter values across multiple parameters simultaneously. Let's get to work, starting with GridSearchCV
!
This is a part of the course
“Extreme Gradient Boosting with XGBoost”
Exercise instructions
- Create a parameter grid called
gbm_param_grid
that contains a list of"colsample_bytree"
values (0.3
,0.7
), a list with a single value for"n_estimators"
(50
), and a list of 2"max_depth"
(2
,5
) values. - Instantiate an
XGBRegressor
object calledgbm
. - Create a
GridSearchCV
object calledgrid_mse
, passing in: the parameter grid toparam_grid
, theXGBRegressor
toestimator
,"neg_mean_squared_error"
toscoring
, and4
tocv
. Also specifyverbose=1
so you can better understand the output. - Fit the
GridSearchCV
object toX
andy
. - Print the best parameter values and lowest RMSE, using the
.best_params_
and.best_score_
attributes, respectively, ofgrid_mse
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create the parameter grid: gbm_param_grid
gbm_param_grid = {
'____': [____, ____],
'____': [____],
'____': [____, ____]
}
# Instantiate the regressor: gbm
gbm = ____
# Perform grid search: grid_mse
grid_mse = ____
# Fit grid_mse to the data
____
# Print the best parameters and lowest RMSE
print("Best parameters found: ", ____)
print("Lowest RMSE found: ", np.sqrt(np.abs(____)))