Get startedGet started for free

Interactions

1. Interactions

In earlier chapters, we said linear regression assumes that inputs affect the outcome linearly and additively. Now we will start to look at cases that violate these assumptions. In this lesson, you will learn about variable interactions. Before

2. Additive relationships

we jump into variable interactions, lets review an example of an additive relationship. Imagine plant height as a function of sunlight and the bacteria in the soil. If the effects of bacteria and sunlight are additive, then the change in plant height is the sum of the effects of bacteria and sunlight respectively. This means a change in sunlight will cause the same change in plant height, no matter what the level of bacteria in the soil, and vice versa.

3. What is an Interaction?

However, if the effect of bacteria is different, depending on the level of sunlight, then there is an interaction between bacteria and sunlight, designated here by the colon. A variable interaction occurs when the simultaneous effect of two variables on the outcome is not additive.

4. What is an Interaction?

Suppose sun is a categorical variable with two values, sun and shade. An interaction between sun and bacteria means that the change in height due to a change in bacteria will be different in sun than it will in shade. It's like having two bacteria models: one for sun and one for shade.

5. Example of no Interaction: Soybean Yield

Here we see a plot of soybean plant yield as a function of water stress and the levels of ozone and sulfur trioxide present. Stress has two levels: "stressed" and "well-watered". The slope between ozone and yield is the same for both stressed and well-watered plants, and the same for sulfur trioxide. This shows that there is no significant interaction between stress and either ozone or sulfur trioxide on soybean yield.

6. Example of an Interaction: Alcohol Metabolism

In this plot, we see alcohol metabolism as a function of gastric dehydrogenase activity for men and women. The slopes of the relationships are quite different by gender, indicating that there is an interaction between gastric activity and gender on alcohol metabolism.

7. Expressing Interactions in Formulae

To fit a linear or additive model with interactions in R, you must specify the interaction explicitly. A colon expresses the interaction between two variables. The asterisk is short hand for the main effects of a and b and the interaction between them. Since the asterisk is also the multiplication symbol in R, you must use the I function in a formula to use the expression "a times b" in a formula.

8. Finding the Correct Interaction Pattern

Unfortunately, using the wrong pattern of main effects and interactions in a linear regression can lead to a suboptimal model, as well as incorrect interpretations of how the inputs affect the outcome. In this course, we are interested in prediction, rather than interpreting the model, so we will use cross-validation to estimate the future prediction performance of three possible alcohol metabolism models: a model with no interactions, a model with both main effects and interactions, and a model with only a main effect for gastric activity and an interaction between sex and gastric activity. In this case, the third model performs the best. In future chapters you will learn about regression methods that can learn certain types of interactions from the data, so you don't have to encode them yourself.

9. Let's practice!

Now let's practice modeling interactions.