Plotting your data

From the previous exercise we know that the ratio of fraud to non-fraud observations is very low. You can do something about that, for example by re-sampling our data, which is explained in the next video.

In this exercise, you'll look at the data and visualize the fraud to non-fraud ratio. It is always a good starting point in your fraud analysis, to look at your data first, before you make any changes to it.

Moreover, when talking to your colleagues, a picture often makes it very clear that we're dealing with heavily imbalanced data. Let's create a plot to visualize the ratio fraud to non-fraud data points on the dataset df.

The function prep_data() is already loaded in your workspace, as well as matplotlib.pyplot as plt.

Define the plot_data(X, y) function, that will nicely plot the given feature set X with labels y in a scatter plot. This has been done for you.
Use the function prep_data() on your dataset df to create feature set X and labels y.
Run the function plot_data() on your newly obtained X and y to visualize your results.

Introduction and preparing your data

Fraud detection using labeled data

Fraud detection using unlabeled data

Fraud detection using text

Exercice

Plotting your data

Instructions