Ensembling it all together

1. Ensembling it all together

Congratulations on completing the course! You've learned quite a lot about ensemble methods, and now is a good opportunity to synthesize all of those learnings.

2. Chapter 1: Voting and Averaging

In Chapter One you learned about the basic ensemble methods: Voting and Averaging. Voting combines the predictions of individual models using the mode. As the mode is a categorical measure, Voting can only be applied to Classification. Voting is heterogeneous, as the individual models are from different algorithms. The Averaging method combines the individual predictions using the mean. In contrast to Voting, Averaging can be applied on both classification and regression. Like Voting, Averaging is also heterogeneous. Both techniques are good choices when you have already built multiple different models for the same problem. Also, if you are not sure which will perform better on unseen data. Or simply you want to improve the overall performance by combining your existing models.

3. Chapter 2: Bagging

In Chapter Two, you first learned the concept of "weak" estimator, one which performs just better than random guessing. It must also be light and fast. "Weak" estimators are the building blocks for homogeneous ensemble methods, in which all the individual estimators use the same algorithm. After this, you learned about Bagging, or Bootstrap Aggregating. The bootstrapping technique draw random subsamples with replacement. A large amount of "weak" estimators are trained with those subsamples, which predictions are then aggregated by Voting or Averaging. Bagging is the first homogeneous ensemble method which you learned. Bagging is a good choice when you want to reduce variance on the predictions. Also if you want to prevent over-fitting. Or if you need more stability and robustness with unseen data. Keep in mind that Bagging is computationally expensive.

4. Chapter 3: Boosting

Chapter 3 was about Boosting. First, you learned the concept of gradual learning, the second kind of heterogeneous ensemble methods. It is based on the iterative learning principle, in which each model attempts to fix the errors from the previous one. Therefore, this approach uses a sequential model building. You learned to use the most popular and recent algorithms from the Boosting family. You first learned to apply AdaBoost. Then you learned about Gradient Boosting technique and its flavors: XGBoost, LightGBM, and CatBoost. The Boosting algorithms are a good choice when you have complex problems, which are not getting good results with traditional methods. Also if you need to apply parallel processing or distributed computing. And if you have big datasets or high-dimensional categorical features, gradient boosting machine can handle these natively.

5. Chapter 4: Stacking

Finally, on Chapter four you learned about Stacking. Stacking also works by combining individual estimators, but the combiner is an estimator itself, instead of just an operation. Stacking can be applied to Classification and Regression. It is also a heterogeneous ensemble method. You learned two ways of implementing Stacking with Python. First, by building the model yourself from scratch using pandas and scikit-learn. And the other option is to use the existing implementation from MLxtend library. Stacking is a good choice when you have already tried Voting or Averaging but results are not so good. Also if you have built models which perform well in different cases, as the second-layer estimator could identify those patterns.

6. Thank you and well ensembled!

Congratulations again on completing this course, and thanks a lot for being a part of this learning journey!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.