Get startedGet started for free

Cross-validation statistics

You used grid search CV to tune your random forest classifier, and now want to inspect the cross-validation results to ensure you did not overfit. In particular you would like to take the difference of the mean test score for each fold from the mean training score. The dataset is available as X_train and y_train, the pipeline as pipe, and a number of modules are pre-loaded including pandas as pd and GridSearchCV().

This exercise is part of the course

Designing Machine Learning Workflows in Python

View Course

Exercise instructions

  • Create a grid search object with three cross-validation folds and ensure it returns training as well as test statistics.
  • Fit the grid search object to the training data.
  • Store the results of the cross-validation, available in the cv_results_ attribute of the fitted CV object, into a dataframe.
  • Print the difference between the column containing the average test score and that containing the average training score.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit your pipeline using GridSearchCV with three folds
grid_search = GridSearchCV(
  pipe, params, ____=3, return_train_score=____)

# Fit the grid search
gs = grid_search.____(____, ____)

# Store the results of CV into a pandas dataframe
results = pd.____(gs.____)

# Print the difference between mean test and training scores
print(
  results[____]-results['mean_train_score'])
Edit and Run Code