Setting up GridSearch parameters
A hyperparameter is a parameter inside a function. For example, max_depth
or min_samples_leaf
are hyperparameters of the DecisionTreeClassifier()
function. Hyperparameter tuning is the process of testing different values of hyperparameters to find the optimal ones: the one that gives the best predictions according to your objectives. In sklearn
, you can use GridSearch to test different combinations of hyperparameters. Even better, you can use GridSearchCV() to test different combinations and run cross-validation on them in one function!
In this exercise, you are going to prepare the different values you want to test for max_depth
and min_samples_leaf
. You will then put these in a dictionary, because that’s what is required for GridSearchCV()
:
- the dictionary keys will be the hyperparameters names
- the dictionary values will be the attributes (the hyperparameter values) you want to test
Instead of writing all the values manually, you will use the range()
function, which allows us to generate values incrementally. For example, range(1, 10, 2)
will generate a list containing values ranging from 1 included to 10 not included, by increments of 2. So the final result will be [1, 3, 5, 7, 9]
.
This exercise is part of the course
HR Analytics: Predicting Employee Churn in Python
Exercise instructions
- Following the format in the example above, generate values for the maximum depth ranging from 5 to 20 with increments of 1
- Do the same for the minimum sample size with values from 50 to 450 with increments of 50
- Create the dictionary by specifying the
max_depth
andmin_samples_leaf
values to try, respective values, using the variables you just created
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Generate values for maximum depth
depth = [i for i in ____(5,21,1)]
# Generate values for minimum sample size
samples = [i for i in range(____,500,____)]
# Create the dictionary with parameters to be checked
parameters = dict(max_depth=depth, min_samples_leaf=____)