Overall validation score
Now it's time to get the actual model performance using cross-validation! How does our store item demand prediction model perform?
Your task is to take the Mean Squared Error (MSE) for each fold separately, and then combine these results into a single number.
For simplicity, you're given get_fold_mse()
function that for each cross-validation split fits a Random Forest model and returns a list of MSE scores by fold. get_fold_mse()
accepts two arguments: train
and TimeSeriesSplit
object.
This exercise is part of the course
Winning a Kaggle Competition in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from sklearn.model_selection import TimeSeriesSplit
import numpy as np
# Sort train data by date
train = train.sort_values('date')
# Initialize 3-fold time cross-validation
kf = ____(n_splits=____)
# Get MSE scores for each cross-validation split
mse_scores = get_fold_mse(train, kf)
print('Mean validation MSE: {:.5f}'.format(np.____(____)))