Column selection and model performance
Creating the training set from different combinations of columns affects the model and the importance values of the columns. Does a different selection of columns also affect the F-1 scores, the combination of the precision
and recall
, of the model? You can answer this question by training two different models on two different sets of columns, and checking the performance.
Inaccurately predicting defaults as non-default can result in unexpected losses if the probability of default for these loans was very low. You can use the F-1 score for defaults to see how the models will accurately predict the defaults.
The credit data, cr_loan_prep
and the two training column sets X
and X2
have been loaded in the workspace. The models gbt
and gbt2
have already been trained.
This exercise is part of the course
Credit Risk Modeling in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Predict the loan_status using each model
____ = gbt.____(____)
____ = gbt2.____(____)
# Print the classification report of the first model
target_names = ['Non-Default', 'Default']
print(____(____, ____, target_names=target_names))
# Print the classification report of the second model
print(____(____, ____, target_names=target_names))