Combatting overfitting with dropout

A common problem with neural networks is they tend to overfit to training data. What this means is the scoring metric, like R\(^2\) or accuracy, is high for the training set, but low for testing and validation sets, and the model is fitting to noise in the training data.

We can work towards preventing overfitting by using dropout. This randomly drops some neurons during the training phase, which helps prevent the net from fitting noise in the training data. keras has a Dropout layer that we can use to accomplish this. We need to set the dropout rate, or fraction of connections dropped during training time. This is set as a decimal between 0 and 1 in the Dropout() layer.

We're going to go back to the mean squared error loss function for this model.

Add a dropout layer (Dropout()) after the first Dense layer in the model, and use 20% (0.2) as the dropout rate.
Use the adam optimizer and the mse loss function when compiling the model in .compile().
Fit the model to the scaled_train_features and train_targets using 25 epochs.

Preparing data and a linear model

Machine learning tree methods

Neural networks and KNN

Machine learning with modern portfolio theory

Exercise

Combatting overfitting with dropout

Instructions