Column selection and model performance
Creating the training set from different combinations of columns affects the model and the importance values of the columns. Does a different selection of columns also affect the F-1 scores, the combination of the precision
and recall
, of the model? You can answer this question by training two different models on two different sets of columns, and checking the performance.
Inaccurately predicting defaults as non-default can result in unexpected losses if the probability of default for these loans was very low. You can use the F-1 score for defaults to see how the models will accurately predict the defaults.
The credit data, cr_loan_prep
and the two training column sets X
and X2
have been loaded in the workspace. The models gbt
and gbt2
have already been trained.
Cet exercice fait partie du cours
Credit Risk Modeling in Python
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Predict the loan_status using each model
____ = gbt.____(____)
____ = gbt2.____(____)
# Print the classification report of the first model
target_names = ['Non-Default', 'Default']
print(____(____, ____, target_names=target_names))
# Print the classification report of the second model
print(____(____, ____, target_names=target_names))