K-fold cross-validation

You will start by getting hands-on experience in the most commonly used K-fold cross-validation.

The data you'll be working with is from the "Two sigma connect: rental listing inquiries" Kaggle competition. The competition problem is a multi-class classification of the rental listings into 3 classes: low interest, medium interest and high interest. For faster performance, you will work with a subsample consisting of 1,000 observations.

You need to implement a K-fold validation strategy and look at the sizes of each fold obtained. train DataFrame is already available in your workspace.

Diese Übung ist Teil des Kurses

Winning a Kaggle Competition in Python

Kurs anzeigen

Anleitung zur Übung

Create a KFold object with 3 folds.
Loop over each split using the kf object.
For each split select training and testing folds using train_index and test_index.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Import KFold
from sklearn.model_selection import KFold

# Create a KFold object
kf = ____(n_splits=____, shuffle=True, random_state=123)

# Loop through each split
fold = 0
for train_index, test_index in ____.____(train):
    # Obtain training and testing folds
    cv_train, cv_test = train.iloc[____], train.iloc[____]
    print('Fold: {}'.format(fold))
    print('CV train shape: {}'.format(cv_train.shape))
    print('Medium interest listings in CV train: {}\n'.format(sum(cv_train.interest_level == 'medium')))
    fold += 1

Code bearbeiten und ausführen