Get startedGet started for free

Creating a Regression Model With Interaction Effects: Part 2, Mediating and Moderating Effects

A CATE is basically an example of statistical moderation (also known as an interaction effect), where the effect of an independent variable is moderated by the effect of a second independent variable. A good example of moderation on a causal effect could be seen in your sink faucets. When your water valves are shut, water does not come out, but when you open your valves, water comes out. The valves are not the direct cause of water come out of your pipes - pressure is - but the valves moderate the relationship between the pressure in your pipes and your sink.

In other words, statistical moderation occurs when the size of one independent variable's effect on an outcome is affected by a second independent variable. In this example, we will find that gender moderates the effect of the treatment (downsizing HR) on someone's intention to leave Unter Technology. With the dataframe, UnterHR, construct three regression models: One that naively estimates the average treatment effect of reducing the size of Unter's HR department on employee turnover, a second that includes a statistical interaction (to allow for moderation) between treatment and gender (Female), and a third that includes interactions between treatment and gender (Female) and treatment and race (Race).

This exercise is part of the course

Causal Inference with R - Regression

View Course

Exercise instructions

  • 1) Construct a regression model that measures the effect of Treatment on LeaveJob, mediated by Female
  • 2) Construct a regression model that measures the effect of Treatment on LeaveJob, mediated by Female and with an interaction effect between Treatment and Female
  • 3) Construct a regression model that measures the effect of Treatment on LeaveJob, mediated by Female and Race, with interaction effects between Treatment and Female and between Treatment and Race.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# 1) First, let's construct a naive regression model that just uses the effect of `Treatment` and `Female` on `LeaveJob` in the dataframe `UnterHR`. We recommend using the summary function around a glm function of this relationship.

    summary(glm( ))

# Note: The model you just created does not assume that the treatment effects women differently, but that women have a different baseline chance of leaving their job, regardless of the treatment effect. From the output above, it seems that being a woman increases one's likelihood of leaving Unter, but the treatment does not. 


# 2) In order to find out if being a women moderates the the treatment effect (i.e. the treatment influences men and women differently), we need to add an interaction between Female and Treatment. Construct a regression model that measures the effect of `Treatment` on `LeaveJob`, with a moderation effect for `Female`. For reference, the syntax for glm with an interaction effect is "glm(Y~X*Z,data=df)". The difference between the syntax for a mediation and moderation effect is that we multiply X and Z for moderation, rather than add them. 
 


# Note: By default, R includes independent variables for Treatment and Female when we include a statistical interaction term. The coefficient for Treatment indicates the effect of Treatment for non-women (men); the coefficient for Female indicates the effect of being a female, independent of Treatment. The Treatment:Female coefficient indicates the effect of Treatment for Females. Now the coefficients for Treatment and Female are no longer statistically significant. There is a large coefficient for the interaction term, suggests that the Treatment has a large effect on women's odds of leaving their job.
      
      
# 3) Let's now answer the final question for this exercise. Create a glm model that includes an interaction effect between Treatment and Female, and an interaction effect between Treatment and Race. The syntax is much like above, but now we add a second pair of variables that are interacting (e.g. "glm(Y~X*Z+W*Z,data=df)").


Edit and Run Code