Categorical and interaction terms
1. Categorical and interaction terms
We will finish this chapter with the discussion on logistic regression when there are categorical and interaction terms present and when you need to consider including interaction terms in your model.2. Categorical variables
One of the important aspects of the logistic regression is that it provides the understanding of the relationship between the binary response variable and the categorical explanatory variable. The simplest logistic model is one in which there is a simple binary explanatory variable x with two levels, for example passing the test with yes and no values. There can also be more than two levels or groups, such as nominal variables where the order is not important and ordinal variables where the order is important.3. Analysis of covariance
Let's consider the case where we have one binary and one continuous explanatory variable, which constitutes the simplest analysis of covariance model. We can write out the familiar logit model4. Analysis of covariance
where we model that y equals 1 conditional on X where X represents explanatory variables x1 and x2.5. Analysis of covariance
Given the binary variable we can also write the model given the two values of x1. Hence, if x1 is equal to 0 we estimate beta0 and beta2, where beta1 term is equal to zero since x1 is zero.6. Analysis of covariance
Similarly if x1 is equal to one we obtain the following form.7. Assumptions
Given the formulation from the previous slide, the assumptions are that for each value of x1 the relationship between x2 and the logit is linear and the lines have an equal slope, meaning that the lines are parallel.8. Assumptions
Notice the only difference between the two equations is in the intercept.9. Assumptions
Therefore, assuming linearity in x2 the model will fail if the two lines are not parallel like in the second figure. How can we account for this?10. Interactions
If the lines are not parallel then there is a presence of interaction between the variables. Interaction is when the effect of x1 on the response depends on the level of x2 and vice versa. To account for interactions and the slopes to differ for different groups of x1 we need to estimate an extra parameter beta3 for the term x1x2. The term x1x2 is usually called the interaction between x1 and x2. Note that this model does not force the lines to be parallel. Similarly, as in previous slides, we can rewrite the logistic model in different ways depending on the value of the binary variable x1.11. Interactions
So when x is 0 we obtain the estimate of beta0 and beta2 where other terms are zero.12. Interactions
Similarly, when x1 equals 1 we obtain the following equation.13. Interactions
Notice that now with the inclusion of the interaction term both the intercept and the slope have changed.14. Visualizing interactions
To summarize, interactions allow for the intercept and slope to differ, where beta1 is the difference between the two intercepts and beta3 is the difference between the two slopes.15. Interaction types
Given different variable data types we can have many different interaction types. Note that the main concepts remain the same for any interaction type, but when we have more than two variable interactions, interactions between two continuous variables, or interaction when both variables are nominal or ordinal, special care should be taken since the interpretations become more complex.16. Let's practice!
Now it is your turn to put these concepts to practice with some hands-on exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.