LoslegenKostenlos loslegen

Resampling techniques

In the last exercise, you saw how class imbalance can impact the results of your confusion matrix. In this exercise, you'll practice resampling techniques to explore the different results that alternative resampling styles can have on a dataset with class imbalance like that seen with loan_data. Using sklearn's resample() function, matching the number of rows in the majority class is called upsampling, while matching the number of rows in the minority class is called downsampling.

You will create both an upsampled and downsampled version of the loan_data dataset, apply a logistic regression on both of them and then evaluate your performance. The training data and its labels that correspond to deny are subset to contain only the minority class and to approve that correspond to the majority.

A train/test split testing object for making predictions has been saved to the workspace as X_test for your use in the exercises.

Diese Übung ist Teil des Kurses

Practicing Machine Learning Interview Questions in Python

Kurs anzeigen

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Upsample minority and combine with majority
loans_upsampled = ____(deny, replace=True, n_samples=len(____), random_state=123)
upsampled = pd.concat([approve, loans_upsampled])

# Downsample majority and combine with minority
loans_downsampled = ____(____, replace = False,  n_samples = len(deny), random_state = 123)
downsampled = pd.concat([loans_downsampled, deny])
Code bearbeiten und ausführen