Performance metrics for the RF model
In the previous exercises you obtained an accuracy score for your random forest model. This time, we know accuracy can be misleading in the case of fraud detection. With highly imbalanced fraud data, the AUROC curve is a more reliable performance metric, used to compare different classifiers. Moreover, the classification report tells you about the precision and recall of your model, whilst the confusion matrix actually shows how many fraud cases you can predict correctly. So let's get these performance metrics.
You'll continue working on the same random forest model from the previous exercise. Your model, defined as model = RandomForestClassifier(random_state=5)
has been fitted to your training data already, and X_train, y_train, X_test, y_test
are available.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Import the classification report, confusion matrix and ROC score from
sklearn.metrics
. - Get the binary predictions from your trained random forest
model
. - Get the predicted probabilities by running the
predict_proba()
function. - Obtain classification report and confusion matrix by comparing
y_test
withpredicted
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the packages to get the different performance metrics
from sklearn.metrics import ____, ____, ____
# Obtain the predictions from our random forest model
predicted = model.____(X_test)
# Predict probabilities
probs = ____.____(X_test)
# Print the ROC curve, classification report and confusion matrix
print(____(y_test, probs[:,1]))
print(____(____, predicted))
print(____(____, ____))