Exercise

# Multiple variables in a logistic regression model

The interpretation of a single parameter still holds when including several variables in a model. When you do include several variables and ask for the interpretation when a certain variable changes, it is assumed that the other variables remain constant, or unchanged. There is a fancy latin phrase for this, *ceteris paribus*, literally meaning "keeping all others the same".

To build a logistic regression model with multiple variables, you can use the `+`

sign to add variables. Your formula will look something like:

```
y ~ x1 + ... + xk
```

In order to evaluate the model there are a number of things to be aware of. You already looked at the parameter values, but that is not the only thing of importance. Also important is the statistical significance of a certain parameter estimate. The significance of a parameter is often referred to as a **p-value**, however in a model output you will see it denoted as `Pr(>|t|)`

. In glm, mild significance is denoted by a "." to very strong significance denoted by "***". When a parameter is not significant, this means you cannot assure that this parameter is significantly different from 0. Statistical significance is important. In general, it only makes sense to interpret the effect on default for significant parameters.

Instructions

**100 XP**

- Create a logistic regression model using the
`glm()`

function, and the`training_set`

. Include the variables`age`

,`ir_cat`

,`grade`

,`loan_amnt`

and`annual_inc`

. Call this model`log_model_multi`

. - Obtain the significance levels using
`summary()`

in combination with our model. You will look more deeply into what significance levels mean in the next exercise!