`KFold()` di scikit-learn

Hai appena eseguito il codice di un collega che crea un modello di random forest e calcola un'accuratezza out-of-sample. Hai notato che nel codice del tuo collega mancava il random state e gli errori che hai trovato erano completamente diversi da quelli riportati dal collega.

Per ottenere una stima migliore di quanto sarà accurato questo modello di random forest su nuovi dati, hai deciso di generare alcuni indici da usare per la cross-validation con KFold.

Questo esercizio fa parte del corso

Validazione dei modelli in Python

Visualizza il corso

Istruzioni dell'esercizio

Chiama il metodo KFold() per suddividere i dati usando cinque split, con shuffling e un random state pari a 1111.
Usa il metodo split() di KFold su X.
Stampa il numero di indici sia nella lista degli indici di training sia in quella degli indici di validazione.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

from sklearn.model_selection import KFold

# Use KFold
kf = KFold(____, ____, ____)

# Create splits
splits = kf.____(____)

# Print the number of indices
for train_index, val_index in splits:
    print("Number of training indices: %s" % len(____))
    print("Number of validation indices: %s" % len(____))

Modifica ed esegui il codice

Questo esercizio fa parte del corso

Validazione dei modelli in Python

IntermediárioNível de habilidade

4.9+

Inizia il corso gratis

Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.

Exercise 1: Introduction to model validation Exercise 2: Modeling steps Exercise 3: Seen vs. unseen data Exercise 4: Regression models Exercise 5: Set parameters and fit a model Exercise 6: Feature importances Exercise 7: Classification models Exercise 8: Classification predictions Exercise 9: Reusing model parameters Exercise 10: Random forest classifier

This chapter focuses on the basics of model validation. From splitting data into training, validation, and testing datasets, to creating an understanding of the bias-variance tradeoff, we build the foundation for the techniques of K-Fold and Leave-One-Out validation practiced in chapter three.

Exercise 1: Creating train, test, and validation datasets Exercise 2: Create one holdout set Exercise 3: Create two holdout sets Exercise 4: Why use holdout sets Exercise 5: Accuracy metrics: regression models Exercise 6: Mean absolute error Exercise 7: Mean squared error Exercise 8: Performance on data subsets Exercise 9: Classification metrics Exercise 10: Confusion matrices Exercise 11: Confusion matrices, again Exercise 12: Precision vs. recall Exercise 13: The bias-variance tradeoff Exercise 14: Error due to under/over-fitting Exercise 15: Am I underfitting?

Holdout sets are a great start to model validation. However, using a single train and test set if often not enough. Cross-validation is considered the gold standard when it comes to validating model performance and is almost always used when tuning model hyper-parameters. This chapter focuses on performing cross-validation to validate model performance.

Exercise 1: I limiti dei set di holdout Exercise 2: Due campioni Exercise 3: Problemi potenziali Exercise 4: Validazione incrociata Exercise 5: `KFold()` di scikit-learn

Esercizio in corso

Exercise 6: Utilizzare gli indici di KFold Exercise 7: cross_val_score() di sklearn Exercise 8: Metodi di scikit-learn Exercise 9: Usa cross_val_score()Exercise 10: Leave-one-out cross-validation (LOOCV)Exercise 11: Quando usare LOOCV Exercise 12: Leave-one-out cross-validation

The first three chapters focused on model validation techniques. In chapter 4 we apply these techniques, specifically cross-validation, while learning about hyperparameter tuning. After all, model validation makes tuning possible and helps us select the overall best model.

Exercise 1: Introduction to hyperparameter tuning Exercise 2: Creating Hyperparameters Exercise 3: Running a model using ranges Exercise 4: RandomizedSearchCV Exercise 5: Preparing for RandomizedSearch Exercise 6: Implementing RandomizedSearchCV Exercise 7: Selecting your final model Exercise 8: Best classification accuracy Exercise 9: Selecting the best precision model Exercise 10: Course completed!