Get startedGet started for free

Overfitting and ensembling

1. Overfitting and ensembling

You may have noticed most of our models do well on the training dataset, but not so good on the test dataset. This is called overfitting. There are multiple ways to deal with this, and we'll look at how to use dropout with neural nets to combat overfitting. We'll also learn about ensembling models for improved predictions.

2. Overfitting

As we've seen with most of our models, the training data is fit well, but not the test data. This is a sign we're overfitting. Depending on your model, you can tune different hyperparameters to decrease overfitting. With neural networks, we have a lot of hyperparameters we can set.

3. Simplify your model

One way to combat overfitting is to decrease the complexity of our model. We saw with decision trees that very deep trees will overfit the data like crazy. We can decrease that overfitting by limiting the depth of the tree. The equivalent in neural nets would be to decrease the total number of neurons.

4. Neural network options

For neural nets, we have lots of options to combat overfitting: decreasing the number of nodes, L1/L2 regularization, Dropout, autoencoder designs, early stopping, adding noise, max norm constraints, and ensembling.

5. Dropout

We'll use dropout here because it's easy to implement and can fairly work well. In keras, it's another layer in our neural net stack. Dropout randomly drops a fraction of neurons during training. This helps the net distribute its learning throughout the model, and also helps the net better generalize to unseen data.

6. Dropout in keras

Adding dropout in keras is easy. We import the Dropout layer from keras-dot-layers, then simply add it as another layer with model-dot-add(). The number we set in the Dropout layer is the fraction of neurons to drop during training. Since we are using very small nets, we should set this number to 0-point-1 or 0-point-2. Often this is actually set to 0-point-5 for larger models like the one shown here.

7. Test set comparison

If we now compare the test set performance, we can see the dropout model does a bit better. The train and test scores become more similar when dropout is added, so we are moving away from overfitting.

8. Ensembling

Another way to fight overfitting is to use ensembling. A random forest is an example of ensembling many decision trees. We simply average the predictions to get a final prediction. This is the kind of ensembling we'll learn here -- simple averaging. You could also take the predictions from multiple models and feed them into another model for a more advanced type of ensembling.

9. Implementing ensembling

We'll now ensemble models using simple averaging. We can use numpy to implement this. We get the predictions from each of our models and save them as test_pred1 and test_pred2, then horizontally stack them with numpy-dot-hstack(). hstack takes a tuple of arrays as an argument, and turns our column vectors into a matrix by horizontally stacking, or column stacking, our two prediction vectors. We can then use numpy-dot-mean() to get the average across the rows, which gives us one prediction for each sample. The axis=1 argument means the averages will be taken across the rows, not columns.

10. Comparing the ensemble

Looking at the R-squared test scores, we can see a slight improvement with the ensemble compared to the individual models, although the effect here is small.

11. Dropout and ensemble!

Ok, it's time for you to try dropout and ensembling.