LoslegenKostenlos loslegen

Cross-validation statistics

You used grid search CV to tune your random forest classifier, and now want to inspect the cross-validation results to ensure you did not overfit. In particular you would like to take the difference of the mean test score for each fold from the mean training score. The dataset is available as X_train and y_train, the pipeline as pipe, and a number of modules are pre-loaded including pandas as pd and GridSearchCV().

Diese Übung ist Teil des Kurses

Designing Machine Learning Workflows in Python

Kurs anzeigen

Anleitung zur Übung

  • Create a grid search object with three cross-validation folds and ensure it returns training as well as test statistics.
  • Fit the grid search object to the training data.
  • Store the results of the cross-validation, available in the cv_results_ attribute of the fitted CV object, into a dataframe.
  • Print the difference between the column containing the average test score and that containing the average training score.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Fit your pipeline using GridSearchCV with three folds
grid_search = GridSearchCV(
  pipe, params, ____=3, return_train_score=____)

# Fit the grid search
gs = grid_search.____(____, ____)

# Store the results of CV into a pandas dataframe
results = pd.____(gs.____)

# Print the difference between mean test and training scores
print(
  results[____]-results['mean_train_score'])
Code bearbeiten und ausführen