1. Bias-variance tradeoff
Now that we laid the foundations of modeling, you are going to use your new skills to understand a very important concept in machine learning.
2. Hyperparameters
When creating a model, the modeler - that's you! - chooses parameters that define the learning process, the hyperparameters. One example is the tree depth, which controls how many splits are made until a decision is reached. Details are to be found in the documentation, like you see here. Stay tuned for chapter three, where we dive deeper into these hyperparameters!
3. Impact on model complexity
Using hyperparameters, you control how simple or complex the structure of your model is, as seen in these two examples. We fit a model which constructs 2 levels from the root to the leaf nodes (that's what tree_depth means), and one with 15 levels to our training data. The impact on the final model is huge.
4. Complex model - overfitting - high variance
Imagine you want to predict the final_grade in your chocolate dataset.
You build a very complex decision tree and find that it fits your training data surprisingly well. In this example, the mean absolute error, calculated by the mae() function, is only around 0-point-2.
However, when you check your model on your test set, you observe very large errors. In this case, we say that your model over-fits the data.
We call that effect 'high variance'.
Our very sophisticated decision tree learned the structure of the training data very well and cannot adapt to the different structure of the test set.
5. Simple model - underfitting - high bias
It's easy to simplify your decision tree - for example by using only a few features or columns of the training set.
After fitting the new tree and binding together the errors of training and test sets - ouch - all of a sudden you have large errors in both cases.
The simple tree is not able to capture the complexity of the training or test set very well.
You under-fit the data. We call that effect 'high bias'.
6. The bias-variance tradeoff
This chart shows the relationship between these two observations.
For very simple models we usually observe high bias and low variance.
For overly complex models, we observe very low bias, but high variance.
This tradeoff in complexity is called the bias-variance tradeoff.
You need to find the right balance of model complexity without overfitting or underfitting the data: the sweet spot in the center.
7. Detecting overfitting
What does all this mean in practice?
As you have learned in the previous lesson, cross-validation is great for using many training and test sets during model development without even touching the final test set.
Suppose you fitted models on all your CV folds.
By using the collect_metrics() function on your resampling results, you find a mean out-of-sample error of your folds of 2-point-4.
You also fit the final model to the whole training set and measure the in-sample error (using the mae() function) of 0-point-2.
As the CV error is much higher than the training error, we can deduce that your model suffers from overfitting. You should decrease the complexity of the tree, for example, by decreasing the number of features or the tree depth.
8. Detecting underfitting
Suppose now that you did that and re-fit your model to the whole training set.
Again, you calculate your mean absolute in-sample error and see that this is very high.
That's a clear indicator that your model now suffers from underfitting.
You did too much simplification and should increase the model's complexity.
9. Let's trade off!
If you think this is tricky, rest assured: you will understand this step by step in the exercises.