CommencerCommencer gratuitement

2D grid search

The drawback of tuning each hyperparameter independently is a potential dependency between different hyperparameters. The better approach is to try all the possible hyperparameter combinations. However, in such cases, the grid search space is rapidly expanding. For example, if we have 2 parameters with 10 possible values, it will yield 100 experiment runs.

Your goal is to find the best hyperparameter couple of max_depth and subsample for the Gradient Boosting model. subsample is a fraction of observations to be used for fitting the individual trees.

You're given a function get_cv_score(), which takes the train dataset and dictionary of the model parameters as arguments and returns the overall validation RMSE score over 3-fold cross-validation.

Cet exercice fait partie du cours

Winning a Kaggle Competition in Python

Afficher le cours

Instructions

  • Specify the grids for possible max_depth and subsample values. For max_depth: 3, 5 and 7. For subsample: 0.8, 0.9 and 1.0.
  • Apply the product() function from the itertools package to the hyperparameter grids. It returns all possible combinations for these two grids.
  • Pass each hyperparameters candidate couple to the model params dictionary.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

import itertools

# Hyperparameter grids
max_depth_grid = [____]
subsample_grid = [____]
results = {}

# For each couple in the grid
for max_depth_candidate, subsample_candidate in itertools.____(max_depth_grid, subsample_grid):
    params = {'max_depth': ____,
              'subsample': ____}
    validation_score = get_cv_score(train, params)
    # Save the results for each couple
    results[(max_depth_candidate, subsample_candidate)] = validation_score   
print(results)
Modifier et exécuter le code