Get startedGet started for free

Performance metrics for the RF model

In the previous exercises you obtained an accuracy score for your random forest model. This time, we know accuracy can be misleading in the case of fraud detection. With highly imbalanced fraud data, the AUROC curve is a more reliable performance metric, used to compare different classifiers. Moreover, the classification report tells you about the precision and recall of your model, whilst the confusion matrix actually shows how many fraud cases you can predict correctly. So let's get these performance metrics.

You'll continue working on the same random forest model from the previous exercise. Your model, defined as model = RandomForestClassifier(random_state=5) has been fitted to your training data already, and X_train, y_train, X_test, y_test are available.

This exercise is part of the course

Fraud Detection in Python

View Course

Exercise instructions

  • Import the classification report, confusion matrix and ROC score from sklearn.metrics.
  • Get the binary predictions from your trained random forest model.
  • Get the predicted probabilities by running the predict_proba() function.
  • Obtain classification report and confusion matrix by comparing y_test with predicted.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the packages to get the different performance metrics
from sklearn.metrics import ____, ____, ____

# Obtain the predictions from our random forest model 
predicted = model.____(X_test)

# Predict probabilities
probs = ____.____(X_test)

# Print the ROC curve, classification report and confusion matrix
print(____(y_test, probs[:,1]))
print(____(____, predicted))
print(____(____, ____))
Edit and Run Code