Ensemble Learning

1. Ensemble Learning

In this lesson, you will learn about a supervised learning technique known as ensemble learning.

2. Advantages of CARTs

Let's first recap what we learned from the previous chapter about CARTs. CARTs present many advantages. For example they are easy to understand and their output is easy to interpret. In addition, CARTs are easy to use and their flexibility gives them an ability to describe nonlinear dependencies between features and labels. Moreover, you don't need a lot of feature preprocessing to train a CART. In contrast to other models, you don't have to standardize or normalize features before feeding them to a CART.

3. Limitations of CARTs

CARTs also have limitations. A classification tree for example, is only able to produce orthogonal decision boundaries. CARTs are also very sensitive to small variations in the training set. Sometimes, when a single point is removed from the training set, a CART's learned parameters may changed drastically. CARTs also suffer from high variance when they are trained without constraints. In such case, they may overfit the training set. A solution that takes advantage of the flexibility of CARTs while reducing their tendency to memorize noise is ensemble learning.

4. Ensemble Learning

Ensemble learning can be summarized as follows: -As a first step, different models are trained on the same dataset. -Each model makes its own predictions. -A meta-model then aggregates the predictions of individual models and outputs a final prediction. -The final prediction is more robust and less prone to errors than each individual model. -The best results are obtained when the models are skillful but in different ways meaning that if some models make predictions that are way off, the other models should compensate these errors. In such case, the meta-model's predictions are more robust.

5. Ensemble Learning: A Visual Explanation

Let's take a look at the diagram here to visually understand how ensemble learning works for a classification problem. First, the training set is fed to different classifiers. Each classifier learns its parameters and makes predictions. Then these predictions are fed to a meta model which aggregates them and outputs a final prediction.

6. Ensemble Learning in Practice: Voting Classifier

Let's now take a look at an ensemble technique known as the voting classifier. More concretely, we'll consider a binary classification task. The ensemble here consists of N classifiers making the predictions P0,P1,to,PN with P=0-or-1. The meta model outputs the final prediction by hard voting.

7. Hard Voting

To understand hard voting, consider a voting classifier that consists of 3 trained classifiers as shown in the diagram here. While classifiers 1 and 3 predict the label of 1 for a new data-point, classifier 2 predicts the label 0. In this case, 1 has 2 votes while 0 has 1 vote. As a result, the voting classifier predicts 1.

8. Voting Classifier in sklearn (Breast-Cancer dataset)

Now that you know what a voting classifier is, let's train one on the breast cancer dataset using scikit-learn. You'll do so using all the features in the dataset to predict whether a cell is malignant or not. In addition to the usual imports, import LogisticRegression, DecisionTreeClassifier and KNeighborsClassifier. You also need to import VotingClassifier from sklearn-dot-ensemble.

9. Voting Classifier in sklearn (Breast-Cancer dataset)

Then, split the data into 70%-train and 30%-test and instantiate the different models as shown here. After that, define a list named classifiers that contains tuples corresponding the the name of the models and the models themselves.

10. Voting Classifier in sklearn (Breast-Cancer dataset)

You can now write a for loop to iterate over the list classifiers; fit each classifier to the training set, evaluate its accuracy on the test set and print the result. The output shows that the best classifier LogisticRegression achieves an accuracy of 94-dot-7%.

11. Voting Classifier in sklearn (Breast-Cancer dataset)

Finally, you can instantiate a voting classifier vc by setting the estimators parameter to classifiers. Fitting vc to the training set yields a test set accuracy of 95-dot-3%. This accuracy is higher than that achieved by any of the individual models in the ensemble.

12. Let's practice!

Now it's time to put this into practice.