Making more discriminative models

In the previous exercise, the range for predicted probabilities of default was rather small. As discussed, small predicted default probabilities are to be expected with low default rates, but building bigger models (which basically means: including more predictors) can expand the range of your predictions.

Whether this will eventually lead to better predictions still needs to be validated and depends on the quality of the newly included predictors. But first, have a look at how bigger models can expand the range.

This exercise is part of the course

Credit Risk Modeling in R

View Course

Exercise instructions

Make log_model_full like the way you made log_model_small, but this time, include all available predictors in the data set. If you don't want to type the name of every column separately, you can simply select all variables using loan_status ~ .
Create your prediction vector predictions_all_full for all the cases in the test set using predict(). Notice that these values represent the probability of defaulting.
Look at the range of the predictions.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Change the code below to construct a logistic regression model using all available predictors in the data set
log_model_small <- glm(loan_status ~ age + ir_cat, family = "binomial", data = training_set)

# Make PD-predictions for all test set elements using the the full logistic regression model


# Look at the predictions range

Edit and Run Code