Seen vs. unseen data

Model's tend to have higher accuracy on observations they have seen before. In the candy dataset, predicting the popularity of Skittles will likely have higher accuracy than predicting the popularity of Andes Mints; Skittles is in the dataset, and Andes Mints is not.

You've built a model based on 50 candies using the dataset X_train and need to report how accurate the model is at predicting the popularity of the 50 candies the model was built on, and the 35 candies (X_test) it has never seen. You will use the mean absolute error, mae(), as the accuracy metric.

Using X_train and X_test as input data, create arrays of predictions using model.predict().
Calculate model accuracy on both data the model has seen and data the model has not seen before.
Use the print statements to print the seen and unseen data.

Basic Modeling in scikit-learn

Validation Basics

Cross Validation

Selecting the best model with Hyperparameter tuning.

Ejercicio

Seen vs. unseen data

Instrucciones