Basic logistic regression
In the video, you looked at a logistic regression model including the variable age
as a predictor. Now, you will include a categorical variable, and learn how to interpret its parameter estimates.
When you include a categorical variable in a logistic regression model in R, you will obtain a parameter estimate for all but one of its categories. This category for which no parameter estimate is given is called the reference category. The parameter for each of the other categories represents the odds ratio in favor of a loan default between the category of interest and the reference category. Don't worry if this doesn't make complete sense to you yet, you'll do more exercises on this later on!
This exercise is part of the course
Credit Risk Modeling in R
Exercise instructions
- Construct a logistic regression model called
log_model_cat
with the categorical variableir_cat
as the only predictor. Your call to glm() should contain three arguments: loan_status ~ ir_cat
family = "binomial"
data = training_set
- View the result in the console to see your parameter estimates.
- Find out what the reference category is by looking at the structure of
ir_cat
(in the full data setloan_data
) again. Use thetable()
function to do this.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Build a glm model with variable ir_cat as a predictor
# Print the parameter estimates
# Look at the different categories in ir_cat using table()