Get Started

Seen vs. unseen data

Model's tend to have higher accuracy on observations they have seen before. In the candy dataset, predicting the popularity of Skittles will likely have higher accuracy than predicting the popularity of Andes Mints; Skittles is in the dataset, and Andes Mints is not.

You've built a model based on 50 candies using the dataset X_train and need to report how accurate the model is at predicting the popularity of the 50 candies the model was built on, and the 35 candies (X_test) it has never seen. You will use the mean absolute error, mae(), as the accuracy metric.

This is a part of the course

“Model Validation in Python”

View Course

Exercise instructions

  • Using X_train and X_test as input data, create arrays of predictions using model.predict().
  • Calculate model accuracy on both data the model has seen and data the model has not seen before.
  • Use the print statements to print the seen and unseen data.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# The model is fit using X_train and y_train
model.fit(X_train, y_train)

# Create vectors of predictions
train_predictions = model.predict(____)
test_predictions = model.predict(____)

# Train/Test Errors
train_error = mae(y_true=y_train, y_pred=____)
test_error = mae(y_true=y_test, y_pred=____)

# Print the accuracy for seen and unseen data
print("Model error on seen data: {0:.2f}.".format(____))
print("Model error on unseen data: {0:.2f}.".format(____))

This exercise is part of the course

Model Validation in Python

IntermediateSkill Level
4.0+
2 reviews

Learn the basics of model validation, validation techniques, and begin creating validated and high performing models.

Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.

Exercise 1: Introduction to model validationExercise 2: Modeling stepsExercise 3: Seen vs. unseen data
Exercise 4: Regression modelsExercise 5: Set parameters and fit a modelExercise 6: Feature importancesExercise 7: Classification modelsExercise 8: Classification predictionsExercise 9: Reusing model parametersExercise 10: Random forest classifier

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free