Get startedGet started for free

Checking results

In this exercise you're going to check the results of your DBSCAN fraud detection model. In reality, you often don't have reliable labels and this where a fraud analyst can help you validate the results. He/She can check your results and see whether the cases you flagged are indeed suspicious. You can also check historically known cases of fraud and see whether your model flags them.

In this case, you'll use the fraud labels to check your model results. The predicted cluster numbers are available under pred_labels as well as the original fraud labels labels.

This exercise is part of the course

Fraud Detection in Python

View Course

Exercise instructions

  • Create a dataframe combining the cluster numbers with the actual labels. This has been done for you.
  • Create a condition that flags fraud for the three smallest clusters: clusters 21, 17 and 9.
  • Create a crosstab from the actual fraud labels with the newly created predicted fraud labels.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a dataframe of the predicted cluster numbers and fraud labels 
df = pd.DataFrame({'clusternr':pred_labels,'fraud':labels})

# Create a condition flagging fraud for the smallest clusters 
df['predicted_fraud'] = np.where((df['clusternr']==21)|(____)|(____),1 , 0)

# Run a crosstab on the results 
print(pd.crosstab(df['fraud'], df['____'], rownames=['Actual Fraud'], colnames=['Flagged Fraud']))
Edit and Run Code