Multiple variables in a logistic regression model

The interpretation of a single parameter still holds when including several variables in a model. When you do include several variables and ask for the interpretation when a certain variable changes, it is assumed that the other variables remain constant, or unchanged. There is a fancy latin phrase for this, ceteris paribus, literally meaning "keeping all others the same".

To build a logistic regression model with multiple variables, you can use the + sign to add variables. Your formula will look something like:

y ~ x1 + ... + xk

In order to evaluate the model there are a number of things to be aware of. You already looked at the parameter values, but that is not the only thing of importance. Also important is the statistical significance of a certain parameter estimate. The significance of a parameter is often referred to as a p-value, however in a model output you will see it denoted as Pr(>|t|). In glm, mild significance is denoted by a "." to very strong significance denoted by "***". When a parameter is not significant, this means you cannot assure that this parameter is significantly different from 0. Statistical significance is important. In general, it only makes sense to interpret the effect on default for significant parameters.

This exercise is part of the course

Credit Risk Modeling in R

View Course

Exercise instructions

  • Create a logistic regression model using the glm() function, and the training_set. Include the variables age, ir_cat, grade, loan_amnt and annual_inc. Call this model log_model_multi.
  • Obtain the significance levels using summary() in combination with our model. You will look more deeply into what significance levels mean in the next exercise!

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build the logistic regression model



# Obtain significance levels using summary()