Errore dovuto a under/overfitting

Il dataset delle caramelle è perfetto per il rischio di overfitting. Con solo 85 osservazioni, se usi il 20% per il dataset di test, perdi molti dati preziosi che potresti usare per l’addestramento del modello. Immagina lo scenario in cui la maggior parte delle caramelle al cioccolato finisce nei dati di training e pochissime nel campione di holdout. Il nostro modello potrebbe vedere solo che il cioccolato è un fattore cruciale, ma non cogliere che anche altri attributi sono importanti. In questo esercizio vedrai come usare troppe feature (colonne) in un modello di random forest possa portare a overfitting.

Una feature indica quali colonne dei dati vengono usate in un albero decisionale. Il parametro max_features limita il numero di feature disponibili.

Questo esercizio fa parte del corso

Validazione dei modelli in Python

Visualizza il corso

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Update the rfr model
rfr = RandomForestRegressor(____=25,
                            ____=1111,
                            ____=2)
rfr.fit(X_train, y_train)

# Print the training and testing accuracies 
print('The training error is {0:.2f}'.format(
  mae(y_train, rfr.predict(X_train))))
print('The testing error is {0:.2f}'.format(
  mae(y_test, rfr.predict(X_test))))

Modifica ed esegui il codice

Questo esercizio fa parte del corso

Validazione dei modelli in Python

IntermediárioNível de habilidade

4.9+

Inizia il corso gratis

Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.

Exercise 1: Introduction to model validation Exercise 2: Modeling steps Exercise 3: Seen vs. unseen data Exercise 4: Regression models Exercise 5: Set parameters and fit a model Exercise 6: Feature importances Exercise 7: Classification models Exercise 8: Classification predictions Exercise 9: Reusing model parameters Exercise 10: Random forest classifier

This chapter focuses on the basics of model validation. From splitting data into training, validation, and testing datasets, to creating an understanding of the bias-variance tradeoff, we build the foundation for the techniques of K-Fold and Leave-One-Out validation practiced in chapter three.

Exercise 1: Creare insiemi di dati di training, test e validation Exercise 2: Crea un set di holdout Exercise 3: Crea due holdout set Exercise 4: Perché usare gli insiemi di holdout Exercise 5: Metriche di accuratezza: modelli di regressione Exercise 6: Errore assoluto medio Exercise 7: Errore quadratico medio Exercise 8: Prestazioni su sottoinsiemi di dati Exercise 9: Metriche di classificazione Exercise 10: Matrici di confusione Exercise 11: Ancora sulle confusion matrix Exercise 12: Precision vs. recall Exercise 13: Il compromesso bias-varianza Exercise 14: Errore dovuto a under/overfitting

Esercizio in corso

Exercise 15: Sto underfittando?

Holdout sets are a great start to model validation. However, using a single train and test set if often not enough. Cross-validation is considered the gold standard when it comes to validating model performance and is almost always used when tuning model hyper-parameters. This chapter focuses on performing cross-validation to validate model performance.

Exercise 1: The problems with holdout sets Exercise 2: Two samples Exercise 3: Potential problems Exercise 4: Cross-validation Exercise 5: scikit-learn's KFold()Exercise 6: Using KFold indices Exercise 7: sklearn's cross_val_score()Exercise 8: scikit-learn's methods Exercise 9: Implement cross_val_score()Exercise 10: Leave-one-out-cross-validation (LOOCV)Exercise 11: When to use LOOCV Exercise 12: Leave-one-out-cross-validation

The first three chapters focused on model validation techniques. In chapter 4 we apply these techniques, specifically cross-validation, while learning about hyperparameter tuning. After all, model validation makes tuning possible and helps us select the overall best model.

Exercise 1: Introduction to hyperparameter tuning Exercise 2: Creating Hyperparameters Exercise 3: Running a model using ranges Exercise 4: RandomizedSearchCV Exercise 5: Preparing for RandomizedSearch Exercise 6: Implementing RandomizedSearchCV Exercise 7: Selecting your final model Exercise 8: Best classification accuracy Exercise 9: Selecting the best precision model Exercise 10: Course completed!