Seen vs. unseen data
Model's tend to have higher accuracy on observations they have seen before. In the candy dataset, predicting the popularity of Skittles will likely have higher accuracy than predicting the popularity of Andes Mints; Skittles is in the dataset, and Andes Mints is not.
You've built a model based on 50 candies using the dataset X_train
and need to report how accurate the model is at predicting the popularity of the 50 candies the model was built on, and the 35 candies (X_test
) it has never seen. You will use the mean absolute error, mae()
, as the accuracy metric.
This is a part of the course
“Model Validation in Python”
Exercise instructions
- Using
X_train
andX_test
as input data, create arrays of predictions usingmodel.predict()
. - Calculate model accuracy on both data the model has seen and data the model has not seen before.
- Use the print statements to print the seen and unseen data.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# The model is fit using X_train and y_train
model.fit(X_train, y_train)
# Create vectors of predictions
train_predictions = model.predict(____)
test_predictions = model.predict(____)
# Train/Test Errors
train_error = mae(y_true=y_train, y_pred=____)
test_error = mae(y_true=y_test, y_pred=____)
# Print the accuracy for seen and unseen data
print("Model error on seen data: {0:.2f}.".format(____))
print("Model error on unseen data: {0:.2f}.".format(____))
This exercise is part of the course
Model Validation in Python
Learn the basics of model validation, validation techniques, and begin creating validated and high performing models.
Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.
Exercise 1: Introduction to model validationExercise 2: Modeling stepsExercise 3: Seen vs. unseen dataExercise 4: Regression modelsExercise 5: Set parameters and fit a modelExercise 6: Feature importancesExercise 7: Classification modelsExercise 8: Classification predictionsExercise 9: Reusing model parametersExercise 10: Random forest classifierWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.