Ensemble methods

1. Ensemble methods

In this lesson, we're going to discuss ensemble learning which is a technique that combines base models to create an optimized predictive model.

2. Ensemble learning techniques

The three most commonly used in machine learning are Bootstrap aggregation, aka bagging, boosting, and model stacking. You may not get a specific question about these methods in a machine learning interview, but a question on bias and variance is almost guaranteed so let’s review so that you’ll be able to answer in the context of simple models such as linear regression and complex ensemble models.

3. Error measurement

Consider the height and diameter of a particular tree species. The black data points belong to the training set and blue to the test set with the curved line representing the true relationship. Recall the difference between a model prediction vs the actual data is the error.

4. Short trees

This scatter plot shows that small diameter trees tend to be short,

5. Tall trees

and Larger diameter trees tall,

6. Fat trees

But after a certain diameter, don't get any taller, just bigger in diameter.

7. Linear model

A linear model,no matter how it is fit to the data, is unable to capture the true curved relationship.

8. Bias

Bias is the inability for a machine learning method to capture the true relationship. Making the assumption that data has a linear relationship when actually more complex results in high bias, underfitting the model, and poor model generalization because a straight line cannot be curved to better fit the data causing a relatively large amount of bias. Bias decreases as a model increases in complexity since the model tends toward a more accurate representation of the complex structure in the data.

9. Complex model

We could train a much more complex model represented by this red squiggly line. Notice how it better fits the training points but fails to fit the test points.

10. Variance

Algorithms in high complexity models tend to model random noise in training data creating a large difference in the fits between training and test datasets. As more complex structures are identified, sensitivity to small changes in the data also increases leading to high variance, overfitting, and poor model generalization.

11. Bias-Variance Trade-Off

The so-called bias-variance trade-off is a conceptual way of comparing and contrasting different models. In machine learning, the best algorithm has low bias and can accurately model the true relationship but also has low variance meaning it can provide consistent predictions with different datasets. This is the sweet spot machine learning seeks to find, achieving the lowest possible bias and the lowest possible variance.

12. Bagging (Bootstrap aggregation)

Which leads us to a discussion on bagging which uses bootstrapped samples. Bootstrapping is a sampling technique where a subset of the data is selected with replacement, meaning that the same row of data may be chosen more than once in a given subset. A model is built with each bootstrapped sample, the output predictions are averaged which reduces variance and produces a more accurate model.

13. Boosting

Boosting also builds multiple individual models, but does so in a sequential order, learning to reduce predictive error from previous models by modifying the original dataset with weights for incorrectly predicted instances which results in a model with decreased bias.

14. Model stacking

Model stacking takes the predictions from individual models and combines them to create a higher accuracy model. This technique has been used to win many machine learning competitions. Model stacking uses predictions of base classifiers as input for training to a 2nd-level model. Much care must be taken to not run the risk of your base model predictions already having "seen" the test set and therefore overfitting when feeding these predictions.

15. Vecstack package

The `vecstack` package contains a convenient function called stacking() which helps with this potential problem by taking in a list of instantiated models and then `X_train`, `y_train`, and `X_test`. It then outputs a set of objects that can conveniently be used in the 2nd-level modeling. Here is a code set that you can use to try out this concept. There is also a link to an article with more information at the bottom of the slide.

16. Ensemble functions

We'll import these functions so you'll get the opportunity to practice these techniques in the exercises. BaggingClassifier and AdaBoostClassifier are both from the scikit-learn package and train an ensemble of decision trees by default while XGClassifier is from the XGBoost package.

17. Bagging vs boosting

Just to bring this all together, bagging methods are known to decrease model variance while boosting methods decrease model bias. At the same time, however, because bias and variance oppose each other, the opposite of what is decreased is increased to find that sweet spot for best model generalization.

18. Major ensemble techniques MCQ

Here is a multiple choice question before heading into the exercises. Which of the following statements is true about the three major techniques used for ensemble methods in Machine Learning? Select the correct answer. If the answer is not immediately apparent, pause this video to read through the possible answers and give yourself a moment to think about it. If you still aren't sure, consider re-watching this video lesson up to this point before revealing the answer in the next slide.

19. Major ensemble techniques MCQ: answer

The answer is model stacking, which takes individual model predictions and combines them to create a model that usually has higher accuracy than any single model.

20. Major ensemble techniques MCQ: incorrect answers

These are the reasons why the other answers are incorrect, make sure you understand them.

21. Let's practice!

Now it's your turn...

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.