scikit-learn's KFold()
You just finished running a colleagues code that creates a random forest model and calculates an out-of-sample accuracy. You noticed that your colleague's code did not have a random state, and the errors you found were completely different than the errors your colleague reported.
To get a better estimate for how accurate this random forest model will be on new data, you have decided to generate some indices to use for KFold cross-validation.
This exercise is part of the course
Model Validation in Python
Exercise instructions
- Call the
KFold()
method to split data using five splits, shuffling, and a random state of 1111. - Use the
split()
method ofKFold
onX
. - Print the number of indices in both the train and validation indices lists.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from sklearn.model_selection import KFold
# Use KFold
kf = KFold(____, ____, ____)
# Create splits
splits = kf.____(____)
# Print the number of indices
for train_index, val_index in splits:
print("Number of training indices: %s" % len(____))
print("Number of validation indices: %s" % len(____))