KFold cross validation
When working with ML models, it's essential to evaluate their performance on unseen data while ensuring that. One common technique for this purpose is k-fold cross-validation. In this exercise, you'll explore how the k-fold cross-validation technique splits a dataset into training and testing sets. KFold
is imported for you, as well as the heart disease dataset features heart_disease_df_X
.
This exercise is part of the course
End-to-End Machine Learning
Exercise instructions
- Create a KFold object with
n_splits=5
,shuffle=True
, andrandom_state=42
- Split the data using
kfold.split()
- Print out the number of datapoints in the train and test splits
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a KFold object
kfold = ____(____, ____, ____)
# Get the train and test data from the first split from the shuffled KFold
train_data_split, test_data_split = next(____.____(____))
# Print out the number of datapoints in the train and test splits
print("Number of training datapoints in heart_disease_df_X:", ____)
print("Number of training datapoints in split:", ____)
print("Number of testing datapoints in split:", ____)