Checking the fraud to non-fraud ratio
In this chapter, you will work on creditcard_sampledata.csv
, a dataset containing credit card transactions data. Fraud occurrences are fortunately an extreme minority in these transactions.
However, Machine Learning algorithms usually work best when the different classes contained in the dataset are more or less equally present. If there are few cases of fraud, then there's little data to learn how to identify them. This is known as class imbalance, and it's one of the main challenges of fraud detection.
Let's explore this dataset, and observe this class imbalance problem.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Import
pandas
aspd
, read the credit card data in and assign it todf
. This has been done for you. - Use
.info()
to print information aboutdf
. - Use
.value_counts()
to get the count of fraudulent and non-fraudulent transactions in the'Class'
column. Assign the result toocc
. - Get the ratio of fraudulent transactions over the total number of transactions in the dataset.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import pandas and read csv
import pandas as pd
df = pd.read_csv("creditcard_data.csv")
# Explore the features available in your dataframe
print(df.____)
# Count the occurrences of fraud and no fraud and print them
occ = df['____'].____()
print(occ)
# Print the ratio of fraud cases
print(occ / ____)