Ensemble Learning

1. Ensemble Learning

In this lesson, you will learn about a supervised learning technique known as ensemble learning.

2. Advantages of CARTs

Let's first recap what we learned from the previous chapter about CARTs. CARTs present many advantages. For example they are easy to understand and their output is easy to interpret. In addition, CARTs are easy to use and their flexibility gives them an ability to describe nonlinear dependencies between features and labels. Moreover, you don't need a lot of feature preprocessing to train a CART. In contrast to other models, you don't have to standardize or normalize features before feeding them to a CART.

3. Limitations of CARTs

CARTs also have limitations. A classification tree for example, is only able to produce orthogonal decision boundaries. CARTs are also very sensitive to small variations in the training set. Sometimes, when a single point is removed from the training set, a CART's learned parameters may changed drastically. CARTs also suffer from high variance when they are trained without constraints. In such case, they may overfit the training set. A solution that takes advantage of the flexibility of CARTs while reducing their tendency to memorize noise is ensemble learning.

4. Ensemble Learning

Ensemble learning can be summarized as follows: -As a first step, different models are trained on the same dataset. -Each model makes its own predictions. -A meta-model then aggregates the predictions of individual models and outputs a final prediction. -The final prediction is more robust and less prone to errors than each individual model. -The best results are obtained when the models are skillful but in different ways meaning that if some models make predictions that are way off, the other models should compensate these errors. In such case, the meta-model's predictions are more robust.

5. Ensemble Learning: A Visual Explanation

Let's take a look at the diagram here to visually understand how ensemble learning works for a classification problem. First, the training set is fed to different classifiers. Each classifier learns its parameters and makes predictions. Then these predictions are fed to a meta model which aggregates them and outputs a final prediction.

6. Ensemble Learning in Practice: Voting Classifier

Let's now take a look at an ensemble technique known as the voting classifier. More concretely, we'll consider a binary classification task. The ensemble here consists of N classifiers making the predictions P0,P1,to,PN with P=0-or-1. The meta model outputs the final prediction by hard voting.

7. Hard Voting

To understand hard voting, consider a voting classifier that consists of 3 trained classifiers as shown in the diagram here. While classifiers 1 and 3 predict the label of 1 for a new data-point, classifier 2 predicts the label 0. In this case, 1 has 2 votes while 0 has 1 vote. As a result, the voting classifier predicts 1.

8. Voting Classifier in sklearn (Breast-Cancer dataset)

Now that you know what a voting classifier is, let's train one on the breast cancer dataset using scikit-learn. You'll do so using all the features in the dataset to predict whether a cell is malignant or not. In addition to the usual imports, import LogisticRegression, DecisionTreeClassifier and KNeighborsClassifier. You also need to import VotingClassifier from sklearn-dot-ensemble.

9. Voting Classifier in sklearn (Breast-Cancer dataset)

Then, split the data into 70%-train and 30%-test and instantiate the different models as shown here. After that, define a list named classifiers that contains tuples corresponding the the name of the models and the models themselves.

10. Voting Classifier in sklearn (Breast-Cancer dataset)

You can now write a for loop to iterate over the list classifiers; fit each classifier to the training set, evaluate its accuracy on the test set and print the result. The output shows that the best classifier LogisticRegression achieves an accuracy of 94-dot-7%.

11. Voting Classifier in sklearn (Breast-Cancer dataset)

Finally, you can instantiate a voting classifier vc by setting the estimators parameter to classifiers. Fitting vc to the training set yields a test set accuracy of 95-dot-3%. This accuracy is higher than that achieved by any of the individual models in the ensemble.

12. Let's practice!

Now it's time to put this into practice.

This exercise is part of the course

Machine Learning with Tree-Based Models in Python

IntermediateSkill Level

4.9+

Start Course for Free

Classification and Regression Trees (CART) are a set of supervised learning models used for problems involving classification and regression. In this chapter, you'll be introduced to the CART algorithm.

Exercise 1: Decision tree for classification Exercise 2: Train your first classification tree Exercise 3: Evaluate the classification tree Exercise 4: Logistic regression vs classification tree Exercise 5: Classification tree Learning Exercise 6: Growing a classification tree Exercise 7: Using entropy as a criterion Exercise 8: Entropy vs Gini index Exercise 9: Decision tree for regression Exercise 10: Train your first regression tree Exercise 11: Evaluate the regression tree Exercise 12: Linear regression vs regression tree

The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this chapter, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.

Exercise 1: Generalization Error Exercise 2: Complexity, bias and variance Exercise 3: Overfitting and underfitting Exercise 4: Diagnose bias and variance problems Exercise 5: Instantiate the model Exercise 6: Evaluate the 10-fold CV error Exercise 7: Evaluate the training error Exercise 8: High bias or high variance?Exercise 9: Ensemble Learning

Current Exercise

Exercise 10: Define the ensemble Exercise 11: Evaluate individual classifiers Exercise 12: Better performance with a Voting Classifier

Bagging is an ensemble method involving training the same algorithm many times using different subsets sampled from the training data. In this chapter, you'll understand how bagging can be used to create a tree ensemble. You'll also learn how the random forests algorithm can lead to further ensemble diversity through randomization at the level of each split in the trees forming the ensemble.

Exercise 1: Bagging Exercise 2: Define the bagging classifier Exercise 3: Evaluate Bagging performance Exercise 4: Out of Bag Evaluation Exercise 5: Prepare the ground Exercise 6: OOB Score vs Test Set Score Exercise 7: Random Forests (RF)Exercise 8: Train an RF regressor Exercise 9: Evaluate the RF regressor Exercise 10: Visualizing features importances

Boosting refers to an ensemble method in which several models are trained sequentially with each model learning from the errors of its predecessors. In this chapter, you'll be introduced to the two boosting methods of AdaBoost and Gradient Boosting.

Exercise 1: Adaboost Exercise 2: Define the AdaBoost classifier Exercise 3: Train the AdaBoost classifier Exercise 4: Evaluate the AdaBoost classifier Exercise 5: Gradient Boosting (GB)Exercise 6: Define the GB regressor Exercise 7: Train the GB regressor Exercise 8: Evaluate the GB regressor Exercise 9: Stochastic Gradient Boosting (SGB)Exercise 10: Regression with SGB Exercise 11: Train the SGB regressor Exercise 12: Evaluate the SGB regressor

The hyperparameters of a machine learning model are parameters that are not learned from data. They should be set prior to fitting the model to the training set. In this chapter, you'll learn how to tune the hyperparameters of a tree-based model using grid search cross validation.

Exercise 1: Tuning a CART's Hyperparameters Exercise 2: Tree hyperparameters Exercise 3: Set the tree's hyperparameter grid Exercise 4: Search for the optimal tree Exercise 5: Evaluate the optimal tree Exercise 6: Tuning a RF's Hyperparameters Exercise 7: Random forests hyperparameters Exercise 8: Set the hyperparameter grid of RF Exercise 9: Search for the optimal forest Exercise 10: Evaluate the optimal forest Exercise 11: Congratulations!