Get startedGet started for free

Categorical and interaction terms

1. Categorical and interaction terms

We will finish this chapter with the discussion on logistic regression when there are categorical and interaction terms present and when you need to consider including interaction terms in your model.

2. Categorical variables

One of the important aspects of the logistic regression is that it provides the understanding of the relationship between the binary response variable and the categorical explanatory variable. The simplest logistic model is one in which there is a simple binary explanatory variable x with two levels, for example passing the test with yes and no values. There can also be more than two levels or groups, such as nominal variables where the order is not important and ordinal variables where the order is important.

3. Analysis of covariance

Let's consider the case where we have one binary and one continuous explanatory variable, which constitutes the simplest analysis of covariance model. We can write out the familiar logit model

4. Analysis of covariance

where we model that y equals 1 conditional on X where X represents explanatory variables x1 and x2.

5. Analysis of covariance

Given the binary variable we can also write the model given the two values of x1. Hence, if x1 is equal to 0 we estimate beta0 and beta2, where beta1 term is equal to zero since x1 is zero.

6. Analysis of covariance

Similarly if x1 is equal to one we obtain the following form.

7. Assumptions

Given the formulation from the previous slide, the assumptions are that for each value of x1 the relationship between x2 and the logit is linear and the lines have an equal slope, meaning that the lines are parallel.

8. Assumptions

Notice the only difference between the two equations is in the intercept.

9. Assumptions

Therefore, assuming linearity in x2 the model will fail if the two lines are not parallel like in the second figure. How can we account for this?

10. Interactions

If the lines are not parallel then there is a presence of interaction between the variables. Interaction is when the effect of x1 on the response depends on the level of x2 and vice versa. To account for interactions and the slopes to differ for different groups of x1 we need to estimate an extra parameter beta3 for the term x1x2. The term x1x2 is usually called the interaction between x1 and x2. Note that this model does not force the lines to be parallel. Similarly, as in previous slides, we can rewrite the logistic model in different ways depending on the value of the binary variable x1.

11. Interactions

So when x is 0 we obtain the estimate of beta0 and beta2 where other terms are zero.

12. Interactions

Similarly, when x1 equals 1 we obtain the following equation.

13. Interactions

Notice that now with the inclusion of the interaction term both the intercept and the slope have changed.

14. Visualizing interactions

To summarize, interactions allow for the intercept and slope to differ, where beta1 is the difference between the two intercepts and beta3 is the difference between the two slopes.

15. Interaction types

Given different variable data types we can have many different interaction types. Note that the main concepts remain the same for any interaction type, but when we have more than two variable interactions, interactions between two continuous variables, or interaction when both variables are nominal or ordinal, special care should be taken since the interpretations become more complex.

16. Let's practice!

Now it is your turn to put these concepts to practice with some hands-on exercises.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.