1. Introduction to boosting
Welcome to the final chapter of this course!
2. Model comparison
In the first two chapters, you learned how to use the training data to create a single classifier.
3. Model comparison
In Chapters two and three, you applied the ensemble methods bagging and random forests to use different training data in parallel to create independent models that use the wisdom of the crowd.
The idea is that the combined prediction of individual models is superior to any of the individual predictions on their own. Because the estimators are independent, these can be trained in parallel to speed up the model building.
4. Model comparison
Let's discuss an improvement of this idea:
5. Model comparison
What if each subsequent model would try to fix the errors of the previous model?
6. Model comparison
Then each model would take advantage of the previous estimator's knowledge.
This way, models cannot be trained in parallel, but the result should be better because each model is an improvement of its predecessor.
7. Model comparison
This technique is called "boosting".
Intuitively, this is similar to the way in which we learn. When you are coding and try to solve the exercises of this course, you receive feedback on the correctness of your solution. You learn from the feedback, and if you made a mistake, which of course very rarely happens, you modify your code for the next attempt.
This way, you are iteratively learning and improving. That is also the reason why boosted trees generally perform better than bagged trees.
8. Adaboost
The first famous boosting algorithm was Adaboost. AdaBoost stands for Adaptive Boosting.
In AdaBoost, each predictor pays more attention to the instances wrongly predicted by its predecessor by constantly changing the weights of training instances.
The Adaboost Algorithm is as follows:
Start by training a decision tree where each observation is assigned an equal weight.
9. Adaboost
After evaluating the first tree, increase the weight of the observations that are difficult to classify and lower the weights of these observations that are easy to classify.
10. Adaboost
Repeat this process for a specified number of iterations. Subsequent trees help in classifying observations that are not well classified by preceding trees.
11. Adaboost
The prediction for the final ensemble model is a weighted sum of the predictions made by previous tree models.
Adaboost was improved by adding a technique called gradient descent to the process. In the next lesson, we will dive deeper into this technique called "gradient boosting".
12. Coding: Specify a boosted ensemble
Now let's see how to create such a boosted model using tidymodels.
boost_tree() is a way to generate a specification of a model before fitting and allows the model to be created using different packages in R.
Since tidymodels is set up well, fitting and making predictions follows the same processes as for the other models you've used so far.
We set the mode to classification and the engine to "xgboost", a popular R package that implements the gradient boosting algorithm.
This way, you're finished specifying your boosting ensemble.
There are a number of hyperparameters that you can specify and tune, in fact far more than for bagged trees, but let's go with the defaults for now.
Of course, there is more work to do, but it's obvious that the interface to boosting in tidymodels is very easy, and there is not much boilerplate to get started.
13. Let's boost!
Now it's your turn to test your understanding of boosting.