Performance metrics for the RF model

In the previous exercises you obtained an accuracy score for your random forest model. This time, we know accuracy can be misleading in the case of fraud detection. With highly imbalanced fraud data, the AUROC curve is a more reliable performance metric, used to compare different classifiers. Moreover, the classification report tells you about the precision and recall of your model, whilst the confusion matrix actually shows how many fraud cases you can predict correctly. So let's get these performance metrics.

You'll continue working on the same random forest model from the previous exercise. Your model, defined as model = RandomForestClassifier(random_state=5) has been fitted to your training data already, and X_train, y_train, X_test, y_test are available.

Diese Übung ist Teil des Kurses

Fraud Detection in Python

Kurs anzeigen

Anleitung zur Übung

Import the classification report, confusion matrix and ROC score from sklearn.metrics.
Get the binary predictions from your trained random forest model.
Get the predicted probabilities by running the predict_proba() function.
Obtain classification report and confusion matrix by comparing y_test with predicted.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Import the packages to get the different performance metrics
from sklearn.metrics import ____, ____, ____

# Obtain the predictions from our random forest model 
predicted = model.____(X_test)

# Predict probabilities
probs = ____.____(X_test)

# Print the ROC curve, classification report and confusion matrix
print(____(y_test, probs[:,1]))
print(____(____, predicted))
print(____(____, ____))

Code bearbeiten und ausführen