Checking results
In this exercise you're going to check the results of your DBSCAN fraud detection model. In reality, you often don't have reliable labels and this where a fraud analyst can help you validate the results. He/She can check your results and see whether the cases you flagged are indeed suspicious. You can also check historically known cases of fraud and see whether your model flags them.
In this case, you'll use the fraud labels to check your model results. The predicted cluster numbers are available under pred_labels
as well as the original fraud labels labels
.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Create a dataframe combining the cluster numbers with the actual labels. This has been done for you.
- Create a condition that flags fraud for the three smallest clusters: clusters 21, 17 and 9.
- Create a crosstab from the actual fraud labels with the newly created predicted fraud labels.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a dataframe of the predicted cluster numbers and fraud labels
df = pd.DataFrame({'clusternr':pred_labels,'fraud':labels})
# Create a condition flagging fraud for the smallest clusters
df['predicted_fraud'] = np.where((df['clusternr']==21)|(____)|(____),1 , 0)
# Run a crosstab on the results
print(pd.crosstab(df['fraud'], df['____'], rownames=['Actual Fraud'], colnames=['Flagged Fraud']))