Bootstrap aggregating

1. Bootstrap aggregating

Having learned about weak models, you're now ready to learn about Bootstrap Aggregating, also know as "Bagging".

2. Heterogeneous vs Homogeneous Ensembles

Until now, you've only seen heterogeneous ensemble methods, which use different types of fine-tuned algorithms. Therefore, they tend to work well with a small number of estimators. For example, we could combine a decision tree, a logistic regression, and a support vector machine using voting to improve the results. Here are included Voting, Averaging, and Stacking. Homogeneous ensemble methods such as bagging, on the other hand, work by applying the same algorithm on all the estimators, and this algorithm must be a "weak" model. In practice, we end up working with a large number of "weak" estimators in order to have better performance than that of a single model. Bagging and Boosting are some of the most popular of this kind.

3. Condorcet's Jury Theorem

You might be wondering how it is possible for a large group of "weak" models to be able to achieve good performance? Again, here is the work of the wisdom of the crowd. Do homogeneous ensemble methods also have that potential? Well, that's what Condorcet showed with his theorem, known as "Condorcet's Jury Theorem". The requirements for this theorem are the following: First, all the models must be independent. Secondly, each model performs better than random guessing. And finally, all individual models have similar performance. If these three conditions are met, then adding more models increases the probability of the ensemble to be correct, and makes this probability tend to 1, equivalent to 100%! The second and third requirements can be fulfilled by using the same "weak" model for all the estimators, as then all will have a similar performance and be better than random guessing.

4. Bootstrapping

To guarantee the first requirement of the theorem, the bagging algorithm trains individual models using a random subsample for each. This is known as bootstrapping, and it guarantees some of the characteristics of a wise crowd. If you recall, a wise crowd needs to be diverse, either through using different algorithms or datasets. Here we're using the same weak model for all the algorithms, but the dataset for each is a different subsample, which provides diversity. Other properties of a wise crowd are independence and no correlation, which are implicit in bootstrapping as the samples are taken separately. After the individual models are trained with their respective samples, they are aggregated using voting or averaging.

5. Pros and cons of bagging

Why is bagging a useful technique? First, it helps reduce variance, as the sampling is truly random. Bias can also be reduced since we use voting or averaging to combine the models. Because of the high number of estimators used, bagging provides stability and robustness. However, Bagging is computationally expensive in terms of space and time.

6. It's time to practice!

Let's now get some practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.