Compare SMOTE to original data

In the last exercise, you saw that using SMOTE suddenly gives us more observations of the minority class. Let's compare those results to our original data, to get a good feeling for what has actually happened. Let's have a look at the value counts again of our old and new data, and let's plot the two scatter plots of the data side by side. You'll use the pre-defined function compare_plot() for that that, which takes the following arguments: X, y, X_resampled, y_resampled, method=''. The function plots your original data in a scatter plot, along with the resampled side by side.

Print the value counts of our original labels, y. Be mindful that y is currently a Numpy array, so in order to use value counts, we'll assign y back as a pandas Series object.
Repeat the step and print the value counts on y_resampled. This shows you how the balance between the two classes has changed with SMOTE.
Use the predefined compare_plot() function called on our original data as well our resampled data to see the scatterplots side by side.

Introduction and preparing your data

Fraud detection using labeled data

Fraud detection using unlabeled data

Fraud detection using text

Exercise

Compare SMOTE to original data

Instructions