In-sample and out-of-sample performance

Does a more sophisticated model always perform better? As we discussed in the video, that's only half the truth.

Overfitted models understand the structure of their training set perfectly but cannot generalize to new data. That's a bummer! At the end of the day, the main purpose of a predictive model is to perform well on new data, right? Go investigate!

Pre-loaded is the last model of the previous exercise, complex_model, and your training and test data (chocolate_train and chocolate_test).

1
- Use complex_model to predict the training set grades, add these predictions to the original training data, and calculate their mean absolute error.

2
- Adapt your code to predict test set grades, add these predictions to the original test data, and calculate the mean absolute error.

Classification Trees

Regression Trees and Cross-Validation

Hyperparameters and Ensemble Models

Boosted Trees

Exercise

In-sample and out-of-sample performance

Instructions 1/2