¿Estoy subajustando?

Estás creando un modelo de random forest para predecir si ganarás una futura partida de tres en raya. Usando el conjunto de datos tic_tac_toe, has creado los conjuntos de entrenamiento y prueba: X_train, X_test, y_train y y_test.

Has decidido crear varios modelos de random forest con distintas cantidades de árboles (1, 2, 3, 4, 5, 10, 20 y 50). Cuantos más árboles uses, más tardará en ejecutarse el modelo. Sin embargo, si no usas suficientes árboles, corres el riesgo de subajuste. Has creado un bucle for para probar tu modelo con los distintos números de árboles.

Este ejercicio forma parte del curso

Validación de modelos en Python

Instrucciones del ejercicio

En cada iteración, predice valores tanto para X_train como para X_test.
En cada iteración, añade el accuracy_score() del conjunto y_train y sus predicciones correspondientes a train_scores.
En cada iteración, añade el accuracy_score() del conjunto y_test y sus predicciones correspondientes a test_scores.
Imprime las puntuaciones de entrenamiento y de prueba usando las sentencias print.

ejercicio interactivo práctico

Prueba este ejercicio completando este código de ejemplo.

from sklearn.metrics import accuracy_score

test_scores, train_scores = [], []
for i in [1, 2, 3, 4, 5, 10, 20, 50]:
    rfc = RandomForestClassifier(n_estimators=i, random_state=1111)
    rfc.fit(X_train, y_train)
    # Create predictions for the X_train and X_test datasets.
    train_predictions = rfc.predict(____)
    test_predictions = rfc.predict(____)
    # Append the accuracy score for the test and train predictions.
    train_scores.append(round(____(____, ____), 2))
    test_scores.append(round(____(____, ____), 2))
# Print the train and test scores.
print("The training scores were: {}".format(____))
print("The testing scores were: {}".format(____))

Editar y ejecutar código

Este ejercicio forma parte del curso

Validación de modelos en Python

IntermedioNivel de habilidad

4.9+

Empieza el curso gratis

Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.

Exercise 1: Introduction to model validation Exercise 2: Modeling steps Exercise 3: Seen vs. unseen data Exercise 4: Regression models Exercise 5: Set parameters and fit a model Exercise 6: Feature importances Exercise 7: Classification models Exercise 8: Classification predictions Exercise 9: Reusing model parameters Exercise 10: Random forest classifier

This chapter focuses on the basics of model validation. From splitting data into training, validation, and testing datasets, to creating an understanding of the bias-variance tradeoff, we build the foundation for the techniques of K-Fold and Leave-One-Out validation practiced in chapter three.

Exercise 1: Crear conjuntos de datos de entrenamiento, prueba y validación Exercise 2: Crea un único conjunto holdout Exercise 3: Crea dos conjuntos de validación (holdout)Exercise 4: Por qué usar conjuntos holdout Exercise 5: Métricas de precisión: modelos de regresión Exercise 6: Error absoluto medio Exercise 7: Error cuadrático medio Exercise 8: Rendimiento por subconjuntos de datos Exercise 9: Métricas de clasificación Exercise 10: Matrices de confusión Exercise 11: Otra vez las matrices de confusión Exercise 12: Precisión vs. exhaustividad Exercise 13: El compromiso entre sesgo y varianza Exercise 14: Error por underfitting/overfitting Exercise 15: ¿Estoy subajustando?

Ejercicio actual

Holdout sets are a great start to model validation. However, using a single train and test set if often not enough. Cross-validation is considered the gold standard when it comes to validating model performance and is almost always used when tuning model hyper-parameters. This chapter focuses on performing cross-validation to validate model performance.

Exercise 1: The problems with holdout sets Exercise 2: Two samples Exercise 3: Potential problems Exercise 4: Cross-validation Exercise 5: scikit-learn's KFold()Exercise 6: Using KFold indices Exercise 7: sklearn's cross_val_score()Exercise 8: scikit-learn's methods Exercise 9: Implement cross_val_score()Exercise 10: Leave-one-out-cross-validation (LOOCV)Exercise 11: When to use LOOCV Exercise 12: Leave-one-out-cross-validation

The first three chapters focused on model validation techniques. In chapter 4 we apply these techniques, specifically cross-validation, while learning about hyperparameter tuning. After all, model validation makes tuning possible and helps us select the overall best model.

Exercise 1: Introduction to hyperparameter tuning Exercise 2: Creating Hyperparameters Exercise 3: Running a model using ranges Exercise 4: RandomizedSearchCV Exercise 5: Preparing for RandomizedSearch Exercise 6: Implementing RandomizedSearchCV Exercise 7: Selecting your final model Exercise 8: Best classification accuracy Exercise 9: Selecting the best precision model Exercise 10: Course completed!