K-fold cross-validation

You will start by getting hands-on experience in the most commonly used K-fold cross-validation.

The data you'll be working with is from the "Two sigma connect: rental listing inquiries" Kaggle competition. The competition problem is a multi-class classification of the rental listings into 3 classes: low interest, medium interest and high interest. For faster performance, you will work with a subsample consisting of 1,000 observations.

You need to implement a K-fold validation strategy and look at the sizes of each fold obtained. train DataFrame is already available in your workspace.

Create a KFold object with 3 folds.
Loop over each split using the kf object.
For each split select training and testing folds using train_index and test_index.

Kaggle competitions process

Dive into the Competition

Feature Engineering

Modeling

Exercise

K-fold cross-validation

Instructions