Get startedGet started for free

Checking the fraud to non-fraud ratio

In this chapter, you will work on creditcard_sampledata.csv, a dataset containing credit card transactions data. Fraud occurrences are fortunately an extreme minority in these transactions.

However, Machine Learning algorithms usually work best when the different classes contained in the dataset are more or less equally present. If there are few cases of fraud, then there's little data to learn how to identify them. This is known as class imbalance, and it's one of the main challenges of fraud detection.

Let's explore this dataset, and observe this class imbalance problem.

This exercise is part of the course

Fraud Detection in Python

View Course

Exercise instructions

  • Import pandas as pd, read the credit card data in and assign it to df. This has been done for you.
  • Use .info() to print information about df.
  • Use .value_counts() to get the count of fraudulent and non-fraudulent transactions in the'Class' column. Assign the result to occ.
  • Get the ratio of fraudulent transactions over the total number of transactions in the dataset.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import pandas and read csv
import pandas as pd
df = pd.read_csv("creditcard_data.csv")

# Explore the features available in your dataframe
print(df.____)

# Count the occurrences of fraud and no fraud and print them
occ = df['____'].____()
print(occ)

# Print the ratio of fraud cases
print(occ / ____)
Edit and Run Code