GridSearchCV to find optimal parameters

In this exercise you're going to tweak our model in a less "random" way, but use GridSearchCV to do the work for you.

With GridSearchCV you can define which performance metric to score the options on. Since for fraud detection we are mostly interested in catching as many fraud cases as possible, you can optimize your model settings to get the best possible Recall score. If you also cared about reducing the number of false positives, you could optimize on F1-score, this gives you that nice Precision-Recall trade-off.

GridSearchCV has already been imported from sklearn.model_selection, so let's give it a try!

Define in the parameter grid that you want to try 1 and 30 trees, and that you want to try the gini and entropy split criterion.
Define the model to be simple RandomForestClassifier, you want to keep the random_state at 5 to be able to compare models.
Set the scoring option such that it optimizes for recall.
Fit the model to the training data X_train and y_train and obtain the best parameters for the model.

Introduction and preparing your data

Fraud detection using labeled data

Fraud detection using unlabeled data

Fraud detection using text

Exercise

GridSearchCV to find optimal parameters

Instructions