Session Ready
Exercise

Hold-out

You already know about the danger of overfitting, which occurs when your model learns the training data too well, but then performs poorly when faced with new data.

Because of that, you've been urged to always test your model using data that wasn't previously used for training.

But don't take our word for it, see for yourself!

You will use a dataset consisting of two classes. 60% of the data has been selected for training and stored in X_train and y_train. The remaining 40% is stored in variables X_test and y_test.

You will train a RandomForestClassifier() model and see the difference in performance:

  • when it's applied on the very same data used to train it
  • when it's applied on data just slightly different from the training set
Instructions 1/2
undefined XP
  • 1
    • Test the model on the same data it used for training.
    • 2
      • Test the model on the hold-out dataset, that is, the data the model hasn't seen during training.