Randomized search
# Call GridSearchCV
grid_search = GridSearchCV(clf, param_grid)
# Fit the model
grid_search.fit(X, y)
In the above chunk of code from the previous exercise, you may have noticed that the first line of code did not take much time to run, while the call to .fit()
took several seconds to execute.
This is because .fit()
is what actually performs the grid search, and in our case, it was grid with many different combinations. As the hyperparameter grid gets larger, grid search becomes slower. In order to solve this problem, instead of trying out every single combination of values, we could randomly jump around the grid and try different combinations. There's a small possibility we may miss the best combination, but we would save a lot of time, or be able to tune more hyperparameters in the same amount of time.
In scikit-learn, you can do this using RandomizedSearchCV
. It has the same API as GridSearchCV
, except that you need to specify a parameter distribution that it can sample from instead of specific hyperparameter values. Let's try it out now! The parameter distribution has been set up for you, along with a random forest classifier called clf
.
This exercise is part of the course
Marketing Analytics: Predicting Customer Churn in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import RandomizedSearchCV