1. The effectiveness of gradual learning
Hi machine learner! Welcome to our third chapter! It's time to add another ensemble method to your inventory: Boosting, which is based on a technique known as gradual learning.
2. Collective vs gradual learning
The ensemble methods you've seen so far are based on an idea known as collective learning - that is, the wisdom of the crowd. The idea is that the combined prediction of individual models is superior to any of the individual predictions on their own.
For collective learning to be efficient, the estimators need to be independent and uncorrelated.
In addition, all the estimators are learning the same task, for the same goal: to predict the target variable given the features.
Because the estimators are independent, these can be trained in parallel to speed up the model building.
Gradual learning methods, on the other hand, are based on the principle of iterative learning. In this approach, each subsequent model tries to fix the errors of the previous model.
Gradual learning creates dependent estimators, as each model takes advantage of the knowledge from the previous estimator.
In iterative learning, each model is learning a different task, but each one contributes to the same goal of accurately predicting the target variable.
As gradual learning follows a sequential model building process, models cannot be trained in parallel.
3. Gradual learning
Intuitively, gradual learning is similar to the way in which we learn. For example, in the exercises of this course, when you try to apply what you've learned in the videos, you receive feedback on whether or not your code is correct. If it is not correct, you learn from the feedback and accordingly modify your code for the next attempt. In this way, you are iteratively learning.
In gradual learning, instead of the same model being corrected in every iteration, a new model is built that tries to fix the errors of the previous model.
4. Fitting to noise
While this learning approach sounds promising, you should remain vigilant. It's possible that some incorrect predictions may be made due to noise in the data, not because those data points are hard to predict. In these cases, you don't want your subsequent estimators to focus on these incorrect predictions. You want to avoid having an estimator which is fitting to noise, which will lead to overfitting.
One way to control this is to stop training after the errors of an estimator start to display white noise.
White noise is characterized by the following properties: that the errors are uncorrelated with the input features, are unbiased, and have a constant variance.
If these properties are not met, then the model can still be improved.
Another approach to control this is to use an improvement tolerance. If the difference in performance does not meet that threshold, then the training is stopped.
5. It's your turn!
It's your turn now to check your understanding on gradual learning!