Get startedGet started for free

Time K-fold

Remember the "Store Item Demand Forecasting Challenge" where you are given store-item sales data, and have to predict future sales?

It's a competition with time series data. So, time K-fold cross-validation should be applied. Your goal is to create this cross-validation strategy and make sure that it works as expected.

Note that the train DataFrame is already available in your workspace, and that TimeSeriesSplit has been imported from sklearn.model_selection.

This exercise is part of the course

Winning a Kaggle Competition in Python

View Course

Exercise instructions

  • Create a TimeSeriesSplit object with 3 splits.
  • Sort the train data by "date" column to apply time K-fold.
  • Loop over each time split using time_kfold object.
  • For each split select training and testing folds using train_index and test_index.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create TimeSeriesSplit object
time_kfold = TimeSeriesSplit(n_splits=____)

# Sort train data by date
train = train.sort_values(____)

# Iterate through each split
fold = 0
for train_index, test_index in ____.____(____):
    cv_train, cv_test = ____.____[____], ____.____[____]
    
    print('Fold :', fold)
    print('Train date range: from {} to {}'.format(cv_train.date.min(), cv_train.date.max()))
    print('Test date range: from {} to {}\n'.format(cv_test.date.min(), cv_test.date.max()))
    fold += 1
Edit and Run Code