Undersampling training data

It's time to undersample the training set yourself with a few lines of code from Pandas. Once the undersampling is complete, you can check the value counts for loan_status to verify the results.

X_y_train, count_nondefault, and count_default are already loaded in the workspace. They have been created using the following code:

X_y_train = pd.concat([X_train.reset_index(drop = True),
                       y_train.reset_index(drop = True)], axis = 1)
count_nondefault, count_default = X_y_train['loan_status'].value_counts()

The .value_counts() for the original training data will print automatically.

Create data sets of non-defaults and defaults stored as nondefaults and defaults.
Sample the nondefaults to the same number as count_default and store it as nondefaults_under.
Concatenate nondefaults and defaults using .concat() and store it as X_y_train_under.
Print the .value_counts() of loan status for the new data set.

Exploring and Preparing Loan Data

Logistic Regression for Defaults

Gradient Boosted Trees Using XGBoost

Model Evaluation and Implementation

Exercise

Undersampling training data

Instructions