Compare SMOTE to original data
In the last exercise, you saw that using SMOTE suddenly gives us more observations of the minority class. Let's compare those results to our original data, to get a good feeling for what has actually happened. Let's have a look at the value counts again of our old and new data, and let's plot the two scatter plots of the data side by side. You'll use the pre-defined function compare_plot()
for that that, which takes the following arguments: X
, y
, X_resampled
, y_resampled
, method=''
. The function plots your original data in a scatter plot, along with the resampled side by side.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Print the value counts of our original labels,
y
. Be mindful thaty
is currently a Numpy array, so in order to use value counts, we'll assigny
back as a pandas Series object. - Repeat the step and print the value counts on
y_resampled
. This shows you how the balance between the two classes has changed with SMOTE. - Use the predefined
compare_plot()
function called on our original data as well our resampled data to see the scatterplots side by side.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Print the value_counts on the original labels y
print(pd.value_counts(pd.Series(____)))
# Print the value_counts
print(____(____(____)))
# Run compare_plot
compare_plot(____, ____, ____, ____, method='SMOTE')