Get startedGet started for free

Tuning your model

1. Tuning your model

Welcome to the final chapter! It's now time to learn how to tune your model so that it can perform as well as possible before you start using it to make business decisions. You'll do this using a hyperparameter tuning technique known as grid-search cross-validation. But first, what are hyperparameters?

2. Refresher

Remember this support vector machine classifier from the last chapter? Notice the output: What do terms like "C", "degree", and "gamma" mean? These are all hyperparameters that are set BEFORE the model is trained, and these values inform how the model learns from the data. Each machine learning algorithm has its own specific hyperparameters.

3. Random forest hyperparameters

As an example, these are some of the hyperparameters of a random forest model. For more on what these mean, I encourage you to dive into the scikit-learn documentation and check out DataCamp's Machine Learning curriculum. For our purposes, what we're interested in is choosing the optimal hyperparameters that lead to the best performing model. The models you have created so far used the default hyperparameters, and now you will learn to tune them. One of the most effective ways to do this is a technique called grid search.

4. Grid search

Grid search is a brute force search through the hyperparameter space to find the optimal value for the hyperparameter of interest. Essentially, it tries a range of different possible hyperparameter values, fits models separately using these different values, and then returns the hyperparameters that lead to the best model fit. This range of hyperparameter values is the "grid" that it searches through. If you specify a large grid, then the model will take longer to run. Scikit-learn's implementation of grid search uses a technique called cross-validation to ensure the models are being tested on unseen data.

5. Grid search in sklearn

To use it, import GridSearchCV from sklearn dot model selection. Then create a dictionary called param grid, in which the keys are the hyperparameter names, such as n estimators, and the values are lists containing the values of the hyperparameter that we wish to tune over. Here, we create an array between 10 and 50 for the n_estimators hyperparameter. You can also specify more than one hyperparameter, of course. If you do, then all possible combinations will be tried. After creating the hyperparameter grid, use GridSearchCV to pass in your instantiated classifier and the dictionary. This returns an object that you can fit to the data, just like any other scikit-learn estimator. This is what performs the grid search. Then, you can use the best params attribute to retrieve the hyperparameter that performs the best. In this case, it looks like the best value for the n_estimators parameter was 43. You can also use the best_score_ attribute to see how the model performs with this hyperparameter.

6. Happy tuning!

For now, it's time to do some grid searching. Happy tuning!