1. Learn
  2. /
  3. Courses
  4. /
  5. Fraud Detection in Python

Exercise

Plotting your data

From the previous exercise we know that the ratio of fraud to non-fraud observations is very low. You can do something about that, for example by re-sampling our data, which is explained in the next video.

In this exercise, you'll look at the data and visualize the fraud to non-fraud ratio. It is always a good starting point in your fraud analysis, to look at your data first, before you make any changes to it.

Moreover, when talking to your colleagues, a picture often makes it very clear that we're dealing with heavily imbalanced data. Let's create a plot to visualize the ratio fraud to non-fraud data points on the dataset df.

The function prep_data() is already loaded in your workspace, as well as matplotlib.pyplot as plt.

Instructions

100 XP
  • Define the plot_data(X, y) function, that will nicely plot the given feature set X with labels y in a scatter plot. This has been done for you.

  • Use the function prep_data() on your dataset df to create feature set X and labels y.

  • Run the function plot_data() on your newly obtained X and y to visualize your results.