Estou com underfitting?

Você está criando um modelo de random forest para prever se vai vencer um jogo futuro de jogo da velha (Tic-Tac-Toe). Usando o conjunto de dados tic_tac_toe, você criou os conjuntos de treino e teste, X_train, X_test, y_train e y_test.

Você decidiu criar vários modelos de random forest com quantidades diferentes de árvores (1, 2, 3, 4, 5, 10, 20 e 50). Quanto mais árvores você usar, mais tempo o modelo levará para rodar. No entanto, se você não usar árvores suficientes, corre o risco de underfitting. Você criou um loop for para testar seu modelo com diferentes números de árvores.

Este exercicio faz parte do curso

Validação de Modelos em Python

Instruções do exercicio

Em cada iteração do loop, gere previsões para os conjuntos X_train e X_test.
Em cada iteração, adicione o accuracy_score() do conjunto y_train e das previsões correspondentes a train_scores.
Em cada iteração, adicione o accuracy_score() do conjunto y_test e das previsões correspondentes a test_scores.
Imprima as pontuações de treino e de teste usando os comandos de impressão.

exercicio interativo prático

Tente este exercicio completando este código de exemplo.

from sklearn.metrics import accuracy_score

test_scores, train_scores = [], []
for i in [1, 2, 3, 4, 5, 10, 20, 50]:
    rfc = RandomForestClassifier(n_estimators=i, random_state=1111)
    rfc.fit(X_train, y_train)
    # Create predictions for the X_train and X_test datasets.
    train_predictions = rfc.predict(____)
    test_predictions = rfc.predict(____)
    # Append the accuracy score for the test and train predictions.
    train_scores.append(round(____(____, ____), 2))
    test_scores.append(round(____(____, ____), 2))
# Print the train and test scores.
print("The training scores were: {}".format(____))
print("The testing scores were: {}".format(____))

Editar e Executar Código

Este exercicio faz parte do curso

Validação de Modelos em Python

IntermediárioNível de habilidade

4.9+

Comece o curso gratuitamente

Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.

Exercise 1: Introduction to model validation Exercise 2: Modeling steps Exercise 3: Seen vs. unseen data Exercise 4: Regression models Exercise 5: Set parameters and fit a model Exercise 6: Feature importances Exercise 7: Classification models Exercise 8: Classification predictions Exercise 9: Reusing model parameters Exercise 10: Random forest classifier

This chapter focuses on the basics of model validation. From splitting data into training, validation, and testing datasets, to creating an understanding of the bias-variance tradeoff, we build the foundation for the techniques of K-Fold and Leave-One-Out validation practiced in chapter three.

Exercise 1: Criando conjuntos de dados de treino, teste e validação Exercise 2: Crie um conjunto de validação (holdout)Exercise 3: Crie dois conjuntos de holdout Exercise 4: Por que usar conjuntos de holdout Exercise 5: Métricas de acurácia: modelos de regressão Exercise 6: Erro absoluto médio Exercise 7: Erro quadrático médio Exercise 8: Desempenho em subconjuntos de dados Exercise 9: Métricas de classificação Exercise 10: Matrizes de confusão Exercise 11: Matrizes de confusão, novamente Exercise 12: Precisão vs. recall Exercise 13: O trade-off entre viés e variância Exercise 14: Erro por under/overfitting Exercise 15: Estou com underfitting?

Exercicio Atual

Holdout sets are a great start to model validation. However, using a single train and test set if often not enough. Cross-validation is considered the gold standard when it comes to validating model performance and is almost always used when tuning model hyper-parameters. This chapter focuses on performing cross-validation to validate model performance.

Exercise 1: The problems with holdout sets Exercise 2: Two samples Exercise 3: Potential problems Exercise 4: Cross-validation Exercise 5: scikit-learn's KFold()Exercise 6: Using KFold indices Exercise 7: sklearn's cross_val_score()Exercise 8: scikit-learn's methods Exercise 9: Implement cross_val_score()Exercise 10: Leave-one-out-cross-validation (LOOCV)Exercise 11: When to use LOOCV Exercise 12: Leave-one-out-cross-validation

The first three chapters focused on model validation techniques. In chapter 4 we apply these techniques, specifically cross-validation, while learning about hyperparameter tuning. After all, model validation makes tuning possible and helps us select the overall best model.

Exercise 1: Introduction to hyperparameter tuning Exercise 2: Creating Hyperparameters Exercise 3: Running a model using ranges Exercise 4: RandomizedSearchCV Exercise 5: Preparing for RandomizedSearch Exercise 6: Implementing RandomizedSearchCV Exercise 7: Selecting your final model Exercise 8: Best classification accuracy Exercise 9: Selecting the best precision model Exercise 10: Course completed!