2D grid search
The drawback of tuning each hyperparameter independently is a potential dependency between different hyperparameters. The better approach is to try all the possible hyperparameter combinations. However, in such cases, the grid search space is rapidly expanding. For example, if we have 2 parameters with 10 possible values, it will yield 100 experiment runs.
Your goal is to find the best hyperparameter couple of max_depth
and subsample
for the Gradient Boosting model. subsample
is a fraction of observations to be used for fitting the individual trees.
You're given a function get_cv_score()
, which takes the train dataset and dictionary of the model parameters as arguments and returns the overall validation RMSE score over 3-fold cross-validation.
This exercise is part of the course
Winning a Kaggle Competition in Python
Exercise instructions
- Specify the grids for possible
max_depth
andsubsample
values. Formax_depth
: 3, 5 and 7. Forsubsample
: 0.8, 0.9 and 1.0. - Apply the
product()
function from theitertools
package to the hyperparameter grids. It returns all possible combinations for these two grids. - Pass each hyperparameters candidate couple to the model
params
dictionary.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
import itertools
# Hyperparameter grids
max_depth_grid = [____]
subsample_grid = [____]
results = {}
# For each couple in the grid
for max_depth_candidate, subsample_candidate in itertools.____(max_depth_grid, subsample_grid):
params = {'max_depth': ____,
'subsample': ____}
validation_score = get_cv_score(train, params)
# Save the results for each couple
results[(max_depth_candidate, subsample_candidate)] = validation_score
print(results)