Checking the fraud to non-fraud ratio

In this chapter, you will work on creditcard_sampledata.csv, a dataset containing credit card transactions data. Fraud occurrences are fortunately an extreme minority in these transactions.

However, Machine Learning algorithms usually work best when the different classes contained in the dataset are more or less equally present. If there are few cases of fraud, then there's little data to learn how to identify them. This is known as class imbalance, and it's one of the main challenges of fraud detection.

Let's explore this dataset, and observe this class imbalance problem.

Import pandas as pd, read the credit card data in and assign it to df. This has been done for you.
Use .info() to print information about df.
Use .value_counts() to get the count of fraudulent and non-fraudulent transactions in the'Class' column. Assign the result to occ.
Get the ratio of fraudulent transactions over the total number of transactions in the dataset.

Introduction and preparing your data

Fraud detection using labeled data

Fraud detection using unlabeled data

Fraud detection using text

Exercise

Checking the fraud to non-fraud ratio

Instructions