CommencerCommencer gratuitement

Performance metrics for the RF model

In the previous exercises you obtained an accuracy score for your random forest model. This time, we know accuracy can be misleading in the case of fraud detection. With highly imbalanced fraud data, the AUROC curve is a more reliable performance metric, used to compare different classifiers. Moreover, the classification report tells you about the precision and recall of your model, whilst the confusion matrix actually shows how many fraud cases you can predict correctly. So let's get these performance metrics.

You'll continue working on the same random forest model from the previous exercise. Your model, defined as model = RandomForestClassifier(random_state=5) has been fitted to your training data already, and X_train, y_train, X_test, y_test are available.

Cet exercice fait partie du cours

Fraud Detection in Python

Afficher le cours

Instructions

  • Import the classification report, confusion matrix and ROC score from sklearn.metrics.
  • Get the binary predictions from your trained random forest model.
  • Get the predicted probabilities by running the predict_proba() function.
  • Obtain classification report and confusion matrix by comparing y_test with predicted.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Import the packages to get the different performance metrics
from sklearn.metrics import ____, ____, ____

# Obtain the predictions from our random forest model 
predicted = model.____(X_test)

# Predict probabilities
probs = ____.____(X_test)

# Print the ROC curve, classification report and confusion matrix
print(____(y_test, probs[:,1]))
print(____(____, predicted))
print(____(____, ____))
Modifier et exécuter le code