Session Ready
Exercise

K-fold cross-validation

You will start by getting hands-on experience in the most commonly used K-fold cross-validation.

The data you'll be working with is from the "Two sigma connect: rental listing inquiries" Kaggle competition. The competition problem is a multi-class classification of the rental listings into 3 classes: low interest, medium interest and high interest. For faster performance, you will work with a subsample consisting of 1,000 observations.

You need to implement a K-fold validation strategy and look at the sizes of each fold obtained. train DataFrame is already available in your workspace.

Instructions
100 XP
  • Create a KFold object with 3 folds.
  • Loop over each split using the kf object.
  • For each split select training and testing folds using train_index and test_index.