Session Ready
Exercise

Cross-validation statistics

You used grid search CV to tune your random forest classifier, and now want to inspect the cross-validation results to ensure you did not overfit. In particular you would like to take the difference of the mean test score for each fold from the mean training score. The dataset is available as X_train and y_train, the pipeline as pipe, and a number of modules are pre-loaded including pandas as pd and GridSearchCV().

Instructions
100 XP
  • Create a grid search object with three cross-validation folds and ensure it returns training as well as test statistics.
  • Fit the grid search object to the training data.
  • Store the results of the cross-validation, available in the cv_results_ attribute of the fitted CV object, into a dataframe.
  • Print the difference between the column containing the average test score and that containing the average training score.