1. The strength of "weak" models
Hello learners! Welcome to the second chapter of our course! In this chapter, you'll learn about a popular and widely used ensemble method: Bagging.
In this first lesson, we'll see what a "weak" model is and how to identify one by its properties.
2. "Weak" model
Voting and averaging, which you learned about in the previous chapter, work by combining the predictions of already trained models.
Usually, these use a small number of estimators that are fine-tuned and individually optimized for the problem.
In fact, these estimators are so well trained that, in some cases, they produce decent results on their own.
We'll refer to these estimators as fine-tuned.
This approach is appropriate when you already have optimized models and want to improve performance further by combining them.
But what happens when you don't have these estimators trained beforehand? Well, that's when "weak" estimators come into play.
3. Fine-tuned vs "weak" models
You may ask yourself: what's the difference between a weak and a fine-tuned model?
First, let's see what a "weak" model is.
The idea of "weak" doesn't mean that it is a bad model, just that it is not as strong as a highly optimized, fine-tuned model.
4. Properties of "weak" models
A weak estimator, or model, is one which is just slightly better than random guessing. Therefore, the error rate is less than 50% but close to it.
A weak model should be light in terms of space and computational requirements, and fast during training and evaluation.
One good example is a decision tree. Imagine that we fit a decision tree to the data, but instead of optimizing it completely, we limit it to a depth of two. This restricts the model to learn as much as possible, but makes sure that it has the three desired properties: low performance (just above random guessing), it is light (we only need two levels of decision), and therefore, it is also fast for predictions.
5. Examples of "weak" models
These are some common examples of weak models. As we stated before, a decision tree constrained with small depth could be used as a weak model.
Another example is logistic regression, which makes the assumption that the classes are linearly separable. This is not always true, in which case, logistic regression would be wrong, but potentially still useful as a weak estimator. We could also limit the number of iterations for training, or specify a high value of the parameter C to use a weak regularization.
For regression problems, we have linear regression. Linear regression, like logistic regression, makes the assumption that the output is a linear function of the input features. In addition, it relies on the independence of these features. Because of the simple assumptions of the model, we can use it as a weak estimator.
As we are more interested in the properties of a weak model, any other estimator which has the three desired properties can be used as well.
6. Let's practice!
Now, let's get some practice with weak models!