Time K-fold
Remember the "Store Item Demand Forecasting Challenge" where you are given store-item sales data, and have to predict future sales?
It's a competition with time series data. So, time K-fold cross-validation should be applied. Your goal is to create this cross-validation strategy and make sure that it works as expected.
Note that the train
DataFrame is already available in your workspace, and that TimeSeriesSplit
has been imported from sklearn.model_selection
.
This exercise is part of the course
Winning a Kaggle Competition in Python
Exercise instructions
- Create a
TimeSeriesSplit
object with 3 splits. - Sort the train data by "date" column to apply time K-fold.
- Loop over each time split using
time_kfold
object. - For each split select training and testing folds using
train_index
andtest_index
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create TimeSeriesSplit object
time_kfold = TimeSeriesSplit(n_splits=____)
# Sort train data by date
train = train.sort_values(____)
# Iterate through each split
fold = 0
for train_index, test_index in ____.____(____):
cv_train, cv_test = ____.____[____], ____.____[____]
print('Fold :', fold)
print('Train date range: from {} to {}'.format(cv_train.date.min(), cv_train.date.max()))
print('Test date range: from {} to {}\n'.format(cv_test.date.min(), cv_test.date.max()))
fold += 1