Leave-one-out-cross-validation

Stel, je favoriete snoepje staat niet in de candy-gegevensset en je bent benieuwd naar de populariteit ervan. Met 5-voudige cross-validatie train je telkens op maar 80% van de data. De candy-gegevensset heeft echter maar 85 rijen, en 20% weglaten kan ons model benadelen. Met leave-one-out-cross-validation haal je het meeste uit onze beperkte gegevensset en krijg je de beste schatting voor de populariteit van je favoriete snoepje!

In deze oefening gebruik je cross_val_score() om LOOCV uit te voeren.

Deze oefening maakt deel uit van de cursus

Modelvalidatie in Python

Cursus bekijken

Oefeninstructies

Maak een scorer met mean_absolute_error voor gebruik in cross_val_score().
Vul cross_val_score() in zodat het model rfr, de zojuist gedefinieerde mae_scorer en LOOCV worden gebruikt.
Print het gemiddelde en de standaarddeviatie van scores met numpy (geladen als np).

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

from sklearn.metrics import mean_absolute_error, make_scorer

# Create scorer
mae_scorer = ____(____)

rfr = RandomForestRegressor(n_estimators=15, random_state=1111)

# Implement LOOCV
scores = cross_val_score(____, X=X, y=y, cv=____, scoring=____)

# Print the mean and standard deviation
print("The mean of the errors is: %s." % np.____(____))
print("The standard deviation of the errors is: %s." % np.____(____))

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Modelvalidatie in Python

SkillTag.level.intermediateSkillTag.label

4.9+

Begin de cursus gratis

Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.

Exercise 1: Introduction to model validation Exercise 2: Modeling steps Exercise 3: Seen vs. unseen data Exercise 4: Regression models Exercise 5: Set parameters and fit a model Exercise 6: Feature importances Exercise 7: Classification models Exercise 8: Classification predictions Exercise 9: Reusing model parameters Exercise 10: Random forest classifier

This chapter focuses on the basics of model validation. From splitting data into training, validation, and testing datasets, to creating an understanding of the bias-variance tradeoff, we build the foundation for the techniques of K-Fold and Leave-One-Out validation practiced in chapter three.

Exercise 1: Creating train, test, and validation datasets Exercise 2: Create one holdout set Exercise 3: Create two holdout sets Exercise 4: Why use holdout sets Exercise 5: Accuracy metrics: regression models Exercise 6: Mean absolute error Exercise 7: Mean squared error Exercise 8: Performance on data subsets Exercise 9: Classification metrics Exercise 10: Confusion matrices Exercise 11: Confusion matrices, again Exercise 12: Precision vs. recall Exercise 13: The bias-variance tradeoff Exercise 14: Error due to under/over-fitting Exercise 15: Am I underfitting?

Holdout sets are a great start to model validation. However, using a single train and test set if often not enough. Cross-validation is considered the gold standard when it comes to validating model performance and is almost always used when tuning model hyper-parameters. This chapter focuses on performing cross-validation to validate model performance.

Exercise 1: De problemen met holdout-sets Exercise 2: Twee steekproeven Exercise 3: Mogelijke problemen Exercise 4: Cross-validatie Exercise 5: scikit-learn's KFold()Exercise 6: KFold-indices gebruiken Exercise 7: sklearns cross_val_score()Exercise 8: Methode’s van scikit-learn Exercise 9: Implementeer cross_val_score()Exercise 10: Leave-one-out-cross-validation (LOOCV)Exercise 11: Wanneer gebruik je LOOCV Exercise 12: Leave-one-out-cross-validation

Huidige oefening

The first three chapters focused on model validation techniques. In chapter 4 we apply these techniques, specifically cross-validation, while learning about hyperparameter tuning. After all, model validation makes tuning possible and helps us select the overall best model.

Exercise 1: Introduction to hyperparameter tuning Exercise 2: Creating Hyperparameters Exercise 3: Running a model using ranges Exercise 4: RandomizedSearchCV Exercise 5: Preparing for RandomizedSearch Exercise 6: Implementing RandomizedSearchCV Exercise 7: Selecting your final model Exercise 8: Best classification accuracy Exercise 9: Selecting the best precision model Exercise 10: Course completed!