Underfitte ich?

Du erstellst ein Random-Forest-Modell, um vorherzusagen, ob du ein zukünftiges Tic-Tac-Toe-Spiel gewinnen wirst. Mit dem Datensatz tic_tac_toe hast du Trainings- und Testdatensätze erstellt: X_train, X_test, y_train und y_test.

Du hast beschlossen, mehrere Random-Forest-Modelle mit unterschiedlich vielen Bäumen (1, 2, 3, 4, 5, 10, 20 und 50) zu erstellen. Je mehr Bäume du verwendest, desto länger braucht dein Random-Forest-Modell zur Ausführung. Wenn du jedoch zu wenige Bäume verwendest, riskierst du Underfitting. Du hast eine for-Schleife erstellt, um dein Modell mit den verschiedenen Anzahlen an Bäumen zu testen.

Diese Übung ist Teil des Kurses

<Kurs>Modellvalidierung in Python</Kurs>

Übungsanweisungen

Sage in jeder Schleife Werte sowohl für die Datensätze X_train als auch X_test voraus.
Hänge in jeder Schleife den accuracy_score() des Datensatzes y_train und der entsprechenden Vorhersagen an train_scores an.
Hänge in jeder Schleife den accuracy_score() des Datensatzes y_test und der entsprechenden Vorhersagen an test_scores an.
Gib die Trainings- und Testwerte mit den print-Anweisungen aus.

Interaktive praktische Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

from sklearn.metrics import accuracy_score

test_scores, train_scores = [], []
for i in [1, 2, 3, 4, 5, 10, 20, 50]:
    rfc = RandomForestClassifier(n_estimators=i, random_state=1111)
    rfc.fit(X_train, y_train)
    # Create predictions for the X_train and X_test datasets.
    train_predictions = rfc.predict(____)
    test_predictions = rfc.predict(____)
    # Append the accuracy score for the test and train predictions.
    train_scores.append(round(____(____, ____), 2))
    test_scores.append(round(____(____, ____), 2))
# Print the train and test scores.
print("The training scores were: {}".format(____))
print("The testing scores were: {}".format(____))

Code bearbeiten und ausführen

Diese Übung ist Teil des Kurses

<Kurs>Modellvalidierung in Python</Kurs>

Mittlere SchwierigkeitSchwierigkeitsgrad

4.9+

Kurs kostenlos starten

Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.

Exercise 1: Introduction to model validation Exercise 2: Modeling steps Exercise 3: Seen vs. unseen data Exercise 4: Regression models Exercise 5: Set parameters and fit a model Exercise 6: Feature importances Exercise 7: Classification models Exercise 8: Classification predictions Exercise 9: Reusing model parameters Exercise 10: Random forest classifier

This chapter focuses on the basics of model validation. From splitting data into training, validation, and testing datasets, to creating an understanding of the bias-variance tradeoff, we build the foundation for the techniques of K-Fold and Leave-One-Out validation practiced in chapter three.

Exercise 1: Train-, Test- und Validierungsdatensätze erstellen Exercise 2: Erstelle ein Holdout-Set Exercise 3: Erstelle zwei Holdout-Sets Exercise 4: Warum Holdout-Sets verwenden Exercise 5: Genauigkeitsmetriken: Regressionsmodelle Exercise 6: Mittlerer absoluter Fehler Exercise 7: Mittlerer quadratischer Fehler Exercise 8: Leistung auf Datenteilmengen Exercise 9: Klassifikationsmetriken Exercise 10: Confusion-Matrizen Exercise 11: Noch einmal: Confusion-Matrizen Exercise 12: Precision vs. Recall Exercise 13: Der Bias-Varianz-Trade-off Exercise 14: Fehler durch Under-/Overfitting Exercise 15: Underfitte ich?

Aktuelle Übung

Holdout sets are a great start to model validation. However, using a single train and test set if often not enough. Cross-validation is considered the gold standard when it comes to validating model performance and is almost always used when tuning model hyper-parameters. This chapter focuses on performing cross-validation to validate model performance.

Exercise 1: The problems with holdout sets Exercise 2: Two samples Exercise 3: Potential problems Exercise 4: Cross-validation Exercise 5: scikit-learn's KFold()Exercise 6: Using KFold indices Exercise 7: sklearn's cross_val_score()Exercise 8: scikit-learn's methods Exercise 9: Implement cross_val_score()Exercise 10: Leave-one-out-cross-validation (LOOCV)Exercise 11: When to use LOOCV Exercise 12: Leave-one-out-cross-validation

The first three chapters focused on model validation techniques. In chapter 4 we apply these techniques, specifically cross-validation, while learning about hyperparameter tuning. After all, model validation makes tuning possible and helps us select the overall best model.

Exercise 1: Introduction to hyperparameter tuning Exercise 2: Creating Hyperparameters Exercise 3: Running a model using ranges Exercise 4: RandomizedSearchCV Exercise 5: Preparing for RandomizedSearch Exercise 6: Implementing RandomizedSearchCV Exercise 7: Selecting your final model Exercise 8: Best classification accuracy Exercise 9: Selecting the best precision model Exercise 10: Course completed!