ComeçarComece de graça

GridSearchCV to find optimal parameters

In this exercise you're going to tweak our model in a less "random" way, but use GridSearchCV to do the work for you.

With GridSearchCV you can define which performance metric to score the options on. Since for fraud detection we are mostly interested in catching as many fraud cases as possible, you can optimize your model settings to get the best possible Recall score. If you also cared about reducing the number of false positives, you could optimize on F1-score, this gives you that nice Precision-Recall trade-off.

GridSearchCV has already been imported from sklearn.model_selection, so let's give it a try!

Este exercício faz parte do curso

Fraud Detection in Python

Ver curso

Instruções do exercício

  • Define in the parameter grid that you want to try 1 and 30 trees, and that you want to try the gini and entropy split criterion.
  • Define the model to be simple RandomForestClassifier, you want to keep the random_state at 5 to be able to compare models.
  • Set the scoring option such that it optimizes for recall.
  • Fit the model to the training data X_train and y_train and obtain the best parameters for the model.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Define the parameter sets to test
param_grid = {'n_estimators': [____, ____], 'max_features': ['auto', 'log2'],  'max_depth': [4, 8], 'criterion': ['____', '____']
}

# Define the model to use
model = ____(random_state=5)

# Combine the parameter sets with the defined model
CV_model = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='____', n_jobs=-1)

# Fit the model to our training data and obtain best parameters
CV_model.fit(____, ____)
CV_model.____
Editar e executar o código