Get startedGet started for free

Column selection and model performance

Creating the training set from different combinations of columns affects the model and the importance values of the columns. Does a different selection of columns also affect the F-1 scores, the combination of the precision and recall, of the model? You can answer this question by training two different models on two different sets of columns, and checking the performance.

Inaccurately predicting defaults as non-default can result in unexpected losses if the probability of default for these loans was very low. You can use the F-1 score for defaults to see how the models will accurately predict the defaults.

The credit data, cr_loan_prep and the two training column sets X and X2 have been loaded in the workspace. The models gbt and gbt2 have already been trained.

This exercise is part of the course

Credit Risk Modeling in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Predict the loan_status using each model
____ = gbt.____(____)
____ = gbt2.____(____)

# Print the classification report of the first model
target_names = ['Non-Default', 'Default']
print(____(____, ____, target_names=target_names))

# Print the classification report of the second model
print(____(____, ____, target_names=target_names))
Edit and Run Code