Get startedGet started for free

Compare SMOTE to original data

In the last exercise, you saw that using SMOTE suddenly gives us more observations of the minority class. Let's compare those results to our original data, to get a good feeling for what has actually happened. Let's have a look at the value counts again of our old and new data, and let's plot the two scatter plots of the data side by side. You'll use the pre-defined function compare_plot() for that that, which takes the following arguments: X, y, X_resampled, y_resampled, method=''. The function plots your original data in a scatter plot, along with the resampled side by side.

This exercise is part of the course

Fraud Detection in Python

View Course

Exercise instructions

  • Print the value counts of our original labels, y. Be mindful that y is currently a Numpy array, so in order to use value counts, we'll assign y back as a pandas Series object.
  • Repeat the step and print the value counts on y_resampled. This shows you how the balance between the two classes has changed with SMOTE.
  • Use the predefined compare_plot() function called on our original data as well our resampled data to see the scatterplots side by side.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Print the value_counts on the original labels y
print(pd.value_counts(pd.Series(____)))

# Print the value_counts
print(____(____(____)))

# Run compare_plot
compare_plot(____, ____, ____, ____, method='SMOTE')
Edit and Run Code