1. Introduction to ensemble methods
Welcome to the course! My name is Román de las Heras and I'll be guiding you through this course on ensemble methods. Let's begin with an intuitive introduction to what ensemble methods are.
2. Choosing the best model
When you're building a model, you want to choose the one that performs the best according to some evaluation metric.
For instance, consider this example. Here we trained a Decision Tree, a Logistic Regression, and a K-Nearest Neighbors model.
If accuracy was our chosen metric, then the Decision Tree would be the best choice, right?
The problem of doing this is that we are discarding the other models, which were able to learn different patterns that might have additional useful properties.
What can we do about it?
3. Surveys
Consider another example. When you conduct a survey, you don't accept only one "best" answer. You consider a combined response of all the participants, and use statistics like the mode or the mean to represent the responses. The combined responses will likely lead to a better decision than relying on a single response. The same principle applies to ensemble methods, where we could form a new model by combining the existing ones.
The combined model will have better performance than any of the individual models, or at least, be as good as the best individual model. This is ensemble learning, and it is one of the most effective techniques in machine learning.
In this course, you'll learn how to ensemble those individual models into a combined, final model.
4. Prerequisite knowledge
This course assumes that you are proficient in supervised machine learning and are familiar with the scikit-learn framework. If not, I recommend that you go through the DataCamp courses listed on this slide to ensure that you are well prepared to learn about ensemble methods.
5. Technologies
We'll be working with scikit-learn as our primary machine learning framework, alongside other Python libraries you might already be familiar with: numpy, pandas, and seaborn.
In addition, you'll be introduced to a useful Python library for machine learning called mlxtend.
As a quick preview of the code you'll learn to write, the main scikit-learn module we'll be using is sklearn dot ensemble. You'll notice how we first import one of the meta estimators. Then, we'll build the individual models, also known as the base estimators, which will serve as input parameters for the combined model.
Usually, the meta estimators will receive a list of estimators as one of its parameters plus some additional parameters which are specific to the ensemble method.
The best feature of the meta estimator is that it works similarly to the scikit-learn estimators you already know, with the standard methods of fit and predict.
6. Learners, ensemble!
Let's now jump into some interactive exercises and refresh our knowledge of scikit-learn. Learners, ensemble!