LoslegenKostenlos loslegen

Checking the fraud to non-fraud ratio

In this chapter, you will work on creditcard_sampledata.csv, a dataset containing credit card transactions data. Fraud occurrences are fortunately an extreme minority in these transactions.

However, Machine Learning algorithms usually work best when the different classes contained in the dataset are more or less equally present. If there are few cases of fraud, then there's little data to learn how to identify them. This is known as class imbalance, and it's one of the main challenges of fraud detection.

Let's explore this dataset, and observe this class imbalance problem.

Diese Übung ist Teil des Kurses

Fraud Detection in Python

Kurs anzeigen

Anleitung zur Übung

  • Import pandas as pd, read the credit card data in and assign it to df. This has been done for you.
  • Use .info() to print information about df.
  • Use .value_counts() to get the count of fraudulent and non-fraudulent transactions in the'Class' column. Assign the result to occ.
  • Get the ratio of fraudulent transactions over the total number of transactions in the dataset.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Import pandas and read csv
import pandas as pd
df = pd.read_csv("creditcard_data.csv")

# Explore the features available in your dataframe
print(df.____)

# Count the occurrences of fraud and no fraud and print them
occ = df['____'].____()
print(occ)

# Print the ratio of fraud cases
print(occ / ____)
Code bearbeiten und ausführen