Utiliser les indices KFold

Vous avez déjà créé splits, qui contient les indices du jeu de données candy-data pour réaliser une validation croisée en 5 plis. Pour mieux estimer les performances du modèle de random forest de votre collègue sur de nouvelles données, vous souhaitez exécuter ce modèle sur les cinq paires d’indices d’entraînement et de validation que vous venez de créer.

Dans cet exercice, vous utiliserez ces indices pour vérifier la précision de ce modèle sur les cinq divisions. Une boucle for est fournie pour vous aider dans cette démarche.

Cet exercice fait partie du cours

<cours>Validation des modèles en Python</cours>

Instructions de l’exercice

Utilisez train_index et val_index pour appeler les bons indices de X et y lors de la création des données d’entraînement et de validation.
Ajustez rfc en utilisant l’ensemble d’entraînement.
Utilisez rfc pour générer des prédictions sur l’ensemble de validation et affichez la précision de validation.

Exercice interactif pratique

Essayez cet exercice en complétant ce code d’exemple.

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

rfc = RandomForestRegressor(n_estimators=25, random_state=1111)

# Access the training and validation indices of splits
for train_index, val_index in splits:
    # Setup the training and validation data
    X_train, y_train = X[____], y[____]
    X_val, y_val = X[____], y[____]
    # Fit the random forest model
    rfc.____(____, ____)
    # Make predictions, and print the accuracy
    predictions = rfc.____(____)
    print("Split accuracy: " + str(mean_squared_error(y_val, predictions)))

Modifier et exécuter le code

Cet exercice fait partie du cours

<cours>Validation des modèles en Python</cours>

IntermédiaireNiveau de compétence

4.9+

Commencer le cours gratuitement

Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.

Exercise 1: Introduction to model validation Exercise 2: Modeling steps Exercise 3: Seen vs. unseen data Exercise 4: Regression models Exercise 5: Set parameters and fit a model Exercise 6: Feature importances Exercise 7: Classification models Exercise 8: Classification predictions Exercise 9: Reusing model parameters Exercise 10: Random forest classifier

This chapter focuses on the basics of model validation. From splitting data into training, validation, and testing datasets, to creating an understanding of the bias-variance tradeoff, we build the foundation for the techniques of K-Fold and Leave-One-Out validation practiced in chapter three.

Exercise 1: Creating train, test, and validation datasets Exercise 2: Create one holdout set Exercise 3: Create two holdout sets Exercise 4: Why use holdout sets Exercise 5: Accuracy metrics: regression models Exercise 6: Mean absolute error Exercise 7: Mean squared error Exercise 8: Performance on data subsets Exercise 9: Classification metrics Exercise 10: Confusion matrices Exercise 11: Confusion matrices, again Exercise 12: Precision vs. recall Exercise 13: The bias-variance tradeoff Exercise 14: Error due to under/over-fitting Exercise 15: Am I underfitting?

Holdout sets are a great start to model validation. However, using a single train and test set if often not enough. Cross-validation is considered the gold standard when it comes to validating model performance and is almost always used when tuning model hyper-parameters. This chapter focuses on performing cross-validation to validate model performance.

Exercise 1: Les limites des jeux de validation (holdout)Exercise 2: Deux échantillons Exercise 3: Problèmes potentiels Exercise 4: Validation croisée Exercise 5: `KFold()` de scikit-learn Exercise 6: Utiliser les indices KFold

Exercice actuel

Exercise 7: cross_val_score() de sklearn Exercise 8: Méthodes de scikit-learn Exercise 9: Implémenter cross_val_score()Exercise 10: Validation croisée leave-one-out (LOOCV)Exercise 11: Quand utiliser la LOOCV Exercise 12: Leave-one-out-cross-validation

The first three chapters focused on model validation techniques. In chapter 4 we apply these techniques, specifically cross-validation, while learning about hyperparameter tuning. After all, model validation makes tuning possible and helps us select the overall best model.

Exercise 1: Introduction to hyperparameter tuning Exercise 2: Creating Hyperparameters Exercise 3: Running a model using ranges Exercise 4: RandomizedSearchCV Exercise 5: Preparing for RandomizedSearch Exercise 6: Implementing RandomizedSearchCV Exercise 7: Selecting your final model Exercise 8: Best classification accuracy Exercise 9: Selecting the best precision model Exercise 10: Course completed!