Basic logistic regression

In the video, you looked at a logistic regression model including the variable age as a predictor. Now, you will include a categorical variable, and learn how to interpret its parameter estimates.

When you include a categorical variable in a logistic regression model in R, you will obtain a parameter estimate for all but one of its categories. This category for which no parameter estimate is given is called the reference category. The parameter for each of the other categories represents the odds ratio in favor of a loan default between the category of interest and the reference category. Don't worry if this doesn't make complete sense to you yet, you'll do more exercises on this later on!

This exercise is part of the course

Credit Risk Modeling in R

View Course

Exercise instructions

  • Construct a logistic regression model called log_model_cat with the categorical variable ir_cat as the only predictor. Your call to glm() should contain three arguments:
  • loan_status ~ ir_cat
  • family = "binomial"
  • data = training_set
  • View the result in the console to see your parameter estimates.
  • Find out what the reference category is by looking at the structure of ir_cat (in the full data set loan_data) again. Use the table() function to do this.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build a glm model with variable ir_cat as a predictor



# Print the parameter estimates 


# Look at the different categories in ir_cat using table()