Get startedGet started for free

Introducing Grid Search

1. Introducing Grid Search

In this section we will look at extending our work on automatic hyperparameter tuning and learn what a Grid Search is. Let's get started!

2. Automating 2 Hyperparameters

Let's remind ourselves of your previous work using a for loop to test different values of the number of neighbors in a KNN algorithm. We then collated those into a DataFrame to analyze. For this section we are working with a reduced dataset so you may see slightly different results.

3. Automating 2 Hyperparameters

But what if we want to test different values of 2 hyperparameters? Let us take the example of a GBM algorithm, which has a few more hyperparameters to tune than KNN or Random Forest algorithms. Let's' say we want to tune the two hyperparameters and values as follows. How would you do that? One suggestion could be a nested loop.

4. Automating 2 Hyperparameters

We can first write nicer code by having the model creation component as a function. We feed in the two hyperparameter values as arguments and use these to create a model Then we fit to our data and generate predictions. Finally we return the hyperparameter values used and the score in a list for analysis.

5. Automating 2 Hyperparameters

Now we can loop through and call our function, appending our results to a list as we go. We have a nested loop so we test all values of our first hyperparameter for all values of our second hyperparameter.

6. Automating 2 Hyperparameters

We can save these results into a DataFrame as well And then print it out to view.

7. How many models?

You will notice that many more models are built when adding more hyperparameters and values to test. Importantly, this relationship between models created and hyperparameters or values to test is not a linear relationship. For each of the values tested for the first hyperparameter, you test every value of the second hyperparameter. This means to test 5 values for the first hyperparameter and 10 values for the second hyperparameter, we have 50 models to run. And what if we k-fold cross-validated each model 10 times? That would be 500 models to run!

8. From 2 to N hyperparameters

That was just for 2 hyperparameters. What if we wanted to test a third or fourth hyperparameter? We could nest again (and again) We first list the extra things to test.

9. From 2 to N hyperparameters

Then we adjust our function to take in more inputs. Notice how our function has a more complex model build but is very similar to what we did before?

10. From 2 to N hyperparameters

Finally, we can adjust our for loop to add the extra level of nesting. This code will also look familiar, we are just adding more levels of nesting but still saving out our results for analysis.

11. From 2 to N hyperparameters

So how many models did we just create? Testing 7 values for our first hyperparameter, and the listed number for the other hyperparameters, we can see this number has greatly increased. Safe to say we cannot keep nesting forever as our code becomes complex and inefficient. Plus, what if we also wanted some extra information on training and testing times and scores. Our code will get quite complex.

12. Introducing Grid Search

Let's review our work in an alternate way. If we created a grid with each value of max_depth that we want to test down the left and each value of learning_rate across the top. The intersection square of each of these is a model that we need to run.

13. Introducing Grid Search

Running a model for every cell in the grid with the hyperparameters specified is known as a Grid Search. For example, the mentioned cell here is equivalent to creating a gradient boosting estimator with these inputs.

14. Grid Search Pros & Cons

Grid search has a number of advantages. It's programmatic, and saves many lines of code. It is guaranteed to find the best model within the grid you specify. But if you specify a poor grid with silly or conflicting values you won't get a good score! Finally, it is an easy methodology to explain compared to some of the more complex ones we will cover later in the course.

15. Grid Search Pros & Cons

However there are some disadvantages to this approach. It is computationally expensive. It is also 'uninformed' because it doesn't learn as it creates models the next model it creates could be better or worse. There are 'informed' methods that get better as they build more and more models and we will see those later in the course.

16. Let's practice!

Let's now practice undertaking a grid search!