Get startedGet started for free

Introducing Hyperparameters

1. Hyperparameters Overview

In the previous lesson, you learned what parameters are. You will now learn what exactly hyperparameters are, how to find and set them, as well as some tips and tricks for prioritizing your efforts. Let's get started.

2. What is a hyperparameter

Hyperparameters are something that you set before the modeling process begins. You can think of them like the knobs and dials on an old radio. You tune the different dials and buttons and hope that a nice tune comes out. The algorithm does not learn the value of these during the modeling process. This is the crucial differentiator between hyperparameters and parameters. Whether you set it or the algorithm learns it and informs you.

3. Hyperparameters in Random Forest

We can easily see the hyperparameters by creating an instance of the estimator and printing it out. Here we create the estimator with default settings and call the print function on our estimator. Those are all our different knobs and dials we can set for our model. There are a lot! But what do they all mean? For this we need to turn to the Scikit Learn documentation.

4. A single hyperparameter

Let us take the example of the 'n_estimators' hyperparameter. We can see in the documentation that it tells us the data type and the default value And it also provides a definition of what it means.

5. Setting hyperparameters

We can set the hyperparameters when we create the estimator object. The default number of trees seems a little low, so let us set that to be 100. Whilst we are at it, let us also set the criterion to be 'entropy'. If we print out the model We can see the other default values remain the same, but those we set explicitly overrode the default values.

6. Hyperparameters in Logistic Regression

What about our logistic regression model, what were the hyperparameters for that? We follow the same steps. Firstly we create a logistic regression estimator. Then we print it out We can see there are less hyperparameters for this model than for the Random Forest.

7. Hyperparameter Importance

Some are more important than others. But before we outline important ones, there are some hyperparameters that definitely will *not* help model performance. These are related to computational decisions or what information to retain for analysis. With the random forest classifier, these hyperparameters will not assist model performance. How many cores to use will only speed up modeling time A random seed and whether to print out information as the modeling occurs also won't assist. Hence some hyperparameters you don't need to 'train' during your work.

8. Random Forest: Important Hyperparameters

There are some generally accepted important hyperparameters to tune for a Random Forest model. The n_estimators (how many trees in the forest) should be set to a high value, 500 or 1000 or even more is not uncommon (noting that there are computational costs to higher values) The max_features controls how many features to consider when splitting, which is vital to ensure tree diversity. The next two control overfitting of individual trees. The 'criterion' hyperparameter may have a small impact but it is not generally a primary hyperparameter to consider. Remember, this is just a guide and your particular problem may require attention on other hyperparameters.

9. How to find hyperparameters that matter?

There are hundreds of machine learning algorithms out there and learning which hyperparameters matter is knowledge you will build over time from a variety of sources. For example, there are some great academic papers where people have tried many combinations of hyperparameters for a specific algorithm on many datasets. These can be a very informative read! You can also find great blogs and tutorials online and consult the Scikit Learn documentation. Of course, one of the best ways to learn is just more practical experience! It is important you research this yourself to build your knowledge base for efficient modeling.

10. Let's practice!

Let's explore some hyperparameters in the exercises!