Default classification reporting
It's time to take a closer look at the evaluation of the model. Here is where setting the threshold for probability of default will help you analyze the model's performance through classification reporting.
Creating a data frame of the probabilities makes them easier to work with, because you can use all the power of pandas
. Apply the threshold to the data and check the value counts for both classes of loan_status
to see how many predictions of each are being created. This will help with insight into the scores from the classification report.
The cr_loan_prep
data set, trained logistic regression clf_logistic
, true loan status values y_test
, and predicted probabilities, preds
are loaded in the workspace.
This exercise is part of the course
Credit Risk Modeling in Python
Exercise instructions
- Create a data frame of just the probabilities of default from
preds
calledpreds_df
. - Reassign
loan_status
values based on a threshold of0.50
for probability of default inpreds_df
. - Print the value counts of the number of rows for each
loan_status
. - Print the classification report using
y_test
andpreds_df
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a dataframe for the probabilities of default
____ = pd.____(____[:,1], columns = ['prob_default'])
# Reassign loan status based on the threshold
____[____] = ____[____].apply(lambda x: 1 if x > ____ else 0)
# Print the row counts for each loan status
print(____[____].____())
# Print the classification report
target_names = ['Non-Default', 'Default']
print(____(____, ____['loan_status'], target_names=target_names))