KFold cross validation

When working with ML models, it's essential to evaluate their performance on unseen data while ensuring that. One common technique for this purpose is k-fold cross-validation. In this exercise, you'll explore how the k-fold cross-validation technique splits a dataset into training and testing sets. KFold is imported for you, as well as the heart disease dataset features heart_disease_df_X.

This exercise is part of the course

End-to-End Machine Learning

View Course

Exercise instructions

Create a KFold object with n_splits=5, shuffle=True, and random_state=42
Split the data using kfold.split()
Print out the number of datapoints in the train and test splits

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a KFold object
kfold = ____(____, ____, ____)

# Get the train and test data from the first split from the shuffled KFold
train_data_split, test_data_split = next(____.____(____))

# Print out the number of datapoints in the train and test splits
print("Number of training datapoints in heart_disease_df_X:", ____)
print("Number of training datapoints in split:", ____)
print("Number of testing datapoints in split:", ____)

Edit and Run Code