LoslegenKostenlos loslegen

Exploring the traditional way to catch fraud

In this exercise you're going to try finding fraud cases in our credit card dataset the "old way". First you'll define threshold values using common statistics, to split fraud and non-fraud. Then, use those thresholds on your features to detect fraud. This is common practice within fraud analytics teams.

Statistical thresholds are often determined by looking at the mean values of observations. Let's start this exercise by checking whether feature means differ between fraud and non-fraud cases. Then, you'll use that information to create common sense thresholds. Finally, you'll check how well this performs in fraud detection.

pandas has already been imported as pd.

Diese Übung ist Teil des Kurses

Fraud Detection in Python

Kurs anzeigen

Anleitung zur Übung

  • Use groupby() to group df on Class and obtain the mean of the features.
  • Create the condition V1 smaller than -3, and V3 smaller than -5 as a condition to flag fraud cases.
  • As a measure of performance, use the crosstab function from pandas to compare our flagged fraud cases to actual fraud cases.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Get the mean for each group
____.____(____).mean()

# Implement a rule for stating which cases are flagged as fraud
df['flag_as_fraud'] = np.where(np.logical_and(______), 1, 0)

# Create a crosstab of flagged fraud cases versus the actual fraud cases
print(____(df.Class, df.flag_as_fraud, rownames=['Actual Fraud'], colnames=['Flagged Fraud']))
Code bearbeiten und ausführen