Using KFold indices
You have already created splits
, which contains indices for the candy-data dataset to complete 5-fold cross-validation. To get a better estimate for how well a colleague's random forest model will perform on a new data, you want to run this model on the five different training and validation indices you just created.
In this exercise, you will use these indices to check the accuracy of this model using the five different splits. A for loop has been provided to assist with this process.
This exercise is part of the course
Model Validation in Python
Exercise instructions
- Use
train_index
andval_index
to call the correct indices ofX
andy
when creating training and validation data. - Fit
rfc
using the training dataset - Use
rfc
to create predictions for validation dataset and print the validation accuracy
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
rfc = RandomForestRegressor(n_estimators=25, random_state=1111)
# Access the training and validation indices of splits
for train_index, val_index in splits:
# Setup the training and validation data
X_train, y_train = X[____], y[____]
X_val, y_val = X[____], y[____]
# Fit the random forest model
rfc.____(____, ____)
# Make predictions, and print the accuracy
predictions = rfc.____(____)
print("Split accuracy: " + str(mean_squared_error(y_val, predictions)))