Adaptive boosting: award winning model

1. Adaptive boosting: award winning model

Welcome to the second lesson! Here, you'll learn about a gradual learning ensemble model: adaptive boosting, also known as AdaBoost. This is an award winning model with a high potential to solve complex problems.

2. Award winning model

AdaBoost is a boosting ensemble method proposed by Yoav Freund and Robert Schapire in 1997. Six years later, with this algorithm they won the Gödel Prize, which is an annual award for leading research papers in theoretical computer science, named in honor of Kurt Gödel, an Austrian mathematician. Besides being the first machine learning algorithm to win this prize, it was also the first practical boosting algorithm. Today, it remains highly used and is well known by machine learning practitioners.

3. AdaBoost properties

There are two distinctive properties of Adaptive Boosting compared to other Boosting algorithms. First, the instances are drawn using a sample distribution of the training data into each subsequent dataset. This sample distribution makes sure that instances which were harder to predict for the previous estimator have a higher chance to be included in the training set for the next estimator by giving them higher weights. The distribution is initialized to be uniform. Secondly, the estimators are combined through weighted majority voting. The voting weights are based on the estimators training error. Estimators which have shown good performance are rewarded with higher weights for voting. In addition, AdaBoost is guaranteed to improve as the ensemble grows if each estimator has an error rate less than 0.5. In other words, each estimator needs to be a "weak" model. And similar to Bagging, AdaBoost can be used for both Classification and Regression with its two variations.

4. AdaBoost classifier with scikit-learn

Now let's see how to instantiate an AdaBoost classifier with scikit-learn. Like with other ensembles, you need to import AdaBoostClassifier from the scikit-learn ensemble module. Then, you can instantiate an AdaBoost classifier calling it with the corresponding parameters. The parameter base_estimator works as usual, it's the weak model template for all the estimators. If not specified, the default is a Decision Tree classifier with a max depth of 1, also known as a decision stump. The second parameter is the number of estimators we want to use. By default is 50. If there's a perfect fit, or an estimator with error higher than 50%, no more estimators are built. Other important parameter is learning rate, which represents how much each estimator contributes to the ensemble. This is 1.0 by default. In addition, there is a trade-off between the number of estimators and the learning rate.

5. AdaBoost regressor with scikit-learn

In a similar way, we can build an AdaBoost regressor. We can also find the AdaBoostRegressor class in the scikit-learn ensemble module. To instantiate an AdaBoost regression model, we need to call it with the same parameters. There's a difference with the parameter base_estimator. If it's not specified, the default will be a Decision Tree regressor with a max depth of 3, opposite to the classifier which had a max depth of 1. In addition, we have the loss parameter, which is the function used to update weights. By default, it is linear, but you can also use the square or exponential loss.

6. Let's practice!

Time to practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.