Hyperparameter tuning

1. Hyperparameter tuning

Welcome to the final Chapter of the course. Congratulations on making this far. Now, we are already familiar with the key approaches to tune and evaluate our model. However, one question that you may have been asking until now is how we decide whether, for example, maximum depth of the tree should be set to 5 or 6 or 10 or any other value. Same goes for other parameters covered until now. The answer is very simple, we just try different values and find the one that provides best possible predictions.

2. GridSearch

Maximum depth, minimum sample size and similar other parameters that need to be tuned to find the best value are known as hyperparameters. To find the optimal values for those hyperparameters, one needs to create a grid, a list of applicable values that he or she wants to test and then search among those values the one that achieves highest accuracy. For example, the maximum depth should not attain very high values, as the tree will start to overfit, but low values are not acceptable as well, as they may provide biased and less accurate predictions. For that reason, let's try to find the optimal value between 5 and 20. Similarly, for minimum sample size in the leaf nodes, let's check values between 50 and 450 with a step of 50. Once those values are generated inside a list, the only thing left is to develop Decision Tree for all possible combinations of those values and compare them to find the values that provide best performance on the test set. This process is known as GridSearch and while it may sound confusing, implementation in Python using sklearn is fairly easy.

3. Cross-Validation

While Train/test split ensures that the model does not overfit training component, hyperparameter tuning may result in overfitting the test component. As a solution, one is encouraged to validate the model on different test components, which is achieved using **Cross Validation**. The latter is general case of Train/test split, as it splits the data into **k** components or folds, where each component has the opportunity of being the **test component**. In this example picture, we have 5 folds, and during each one of the components is Test while others are used as Train. Having different Folds, ensures that our model does not overfit the test component. This is exactly what GridSearch in sklearn is using to understand which model is better.

4. Let's practice!

Now let's try some examples.