Resampling techniques
In the last exercise, you saw how class imbalance can impact the results of your confusion matrix. In this exercise, you'll practice resampling techniques to explore the different results that alternative resampling styles can have on a dataset with class imbalance like that seen with loan_data
. Using sklearn
's resample()
function, matching the number of rows in the majority class is called upsampling, while matching the number of rows in the minority class is called downsampling.
You will create both an upsampled and downsampled version of the loan_data
dataset, apply a logistic regression on both of them and then evaluate your performance. The training data and its labels that correspond to deny
are subset to contain only the minority class and to approve
that correspond to the majority.
A train/test split testing object for making predictions has been saved to the workspace as X_test
for your use in the exercises.
This exercise is part of the course
Practicing Machine Learning Interview Questions in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Upsample minority and combine with majority
loans_upsampled = ____(deny, replace=True, n_samples=len(____), random_state=123)
upsampled = pd.concat([approve, loans_upsampled])
# Downsample majority and combine with minority
loans_downsampled = ____(____, replace = False, n_samples = len(deny), random_state = 123)
downsampled = pd.concat([loans_downsampled, deny])