2D grid search

The drawback of tuning each hyperparameter independently is a potential dependency between different hyperparameters. The better approach is to try all the possible hyperparameter combinations. However, in such cases, the grid search space is rapidly expanding. For example, if we have 2 parameters with 10 possible values, it will yield 100 experiment runs.

Your goal is to find the best hyperparameter couple of max_depth and subsample for the Gradient Boosting model. subsample is a fraction of observations to be used for fitting the individual trees.

You're given a function get_cv_score(), which takes the train dataset and dictionary of the model parameters as arguments and returns the overall validation RMSE score over 3-fold cross-validation.

This exercise is part of the course

Winning a Kaggle Competition in Python

View Course

Exercise instructions

Specify the grids for possible max_depth and subsample values. For max_depth: 3, 5 and 7. For subsample: 0.8, 0.9 and 1.0.
Apply the product() function from the itertools package to the hyperparameter grids. It returns all possible combinations for these two grids.
Pass each hyperparameters candidate couple to the model params dictionary.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

import itertools

# Hyperparameter grids
max_depth_grid = [____]
subsample_grid = [____]
results = {}

# For each couple in the grid
for max_depth_candidate, subsample_candidate in itertools.____(max_depth_grid, subsample_grid):
    params = {'max_depth': ____,
              'subsample': ____}
    validation_score = get_cv_score(train, params)
    # Save the results for each couple
    results[(max_depth_candidate, subsample_candidate)] = validation_score   
print(results)

Edit and Run Code