Model ensembling

1. Model ensembling

So far, we've been talking only about individual models. Now it's time to combine multiple models together.

2. Model ensembling

Kaggle top solutions are usually not a single model, but a combination of a large number of various models. Different ways to combine models together is called model ensembling. For example, here is an ensemble design for a winning solution in the Homesite Quote Conversion challenge. We can see hundreds of models with multi-level stacking and blending. Let's learn about these 'blending' and 'stacking' terms.

3. Model blending

The idea of ensemble learning is to build a prediction model by combining the strengths of a collection of simpler base models. The so-called blending approach is to just find an average of our multiple models predictions. Say we're solving a regression problem with a continuous target variable. And we have trained two models: A and B. So, for each test observation, there are model A and model B predictions available.

4. Model blending

To combine models together we can just find the predictions mean, taking the sum and divide it by two. As we see, it allows us to tweak predictions, and to take into account both model A and model B opinions. That's it, such a simple ensembling method in the majority of cases will yield some improvement to our single models.

5. Model blending

Arithmetic mean works for both regression and classification problems. However, for the classification, it's better to use a geometric mean of the class probabilities predicted.

6. Model stacking

The more advanced ensembling approach is called model stacking. The idea is to train multiple single models, take their predictions and use these predictions as features in the 2nd level model. So, we need to perform the following steps: Split train data into two parts. Part 1 and Part 2. Train multiple single models on the first part. Make predictions on the second part of the train data, and on the test data. Now, we have models predictions for both Part 2 of the train data and for the test data. It means that we could create a new model using these predictions as features. This model is called the 2nd level model or meta-model. Its predictions on the test data give us the stacking output.

7. Stacking example

Let's consider all these steps on the example. Suppose we are given a binary classification problem with a bunch of numerical features: feature_1, feature_2 and so on to feature_N. For the train data, target variable is known. And we need to make predictions on the test data with the unknown target variable.

8. Stacking example

First of all, we split train data into two separate parts: Part 1 and Part 2.

9. Stacking example

Then we train multiple single models only on the first part of the train data. For example, we've trained three different models denoting them as A, B and C.

10. Stacking example

Having these 3 models we make the predictions on part 2 of the train data. The columns with the predictions are denoted as A_pred, B_pred and C_pred. Then make the predictions on the test data as well.

11. Stacking example

So, now we have models predictions for both Part 2 of the train data and for the test data.

12. Stacking example

It's now possible to create a second level model using these predictions as features. It's trained on the Part 2 train data and is used to make predictions on the test data.

13. Stacking example

As a result, we obtain stacking predictions for the test data. Thus, we combined individual model predictions into a single number using a 2nd level model.

14. Let's practice!

OK, having learned the theory, move on to build your own model ensembles!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.