Creating a Regression Model With Interaction Effects: Part 1
An experiment conducted by the transportation network company, Unter Technologies, tried to determine whether downsizing their Human Resources (HR) department would lead to higher employee turnover. Using t.tests, they found that the effect of downsizing their HR department would have different conditional average treatment effects (CATE) on men and women. This was determined by separately studying the average treatment effect (ATE) of downsizing on a sample of men and on a sample of just women.
However, slicing data this way can substantially reduce one's statistical power, and becomes unwieldy when a data scientist wants to determine more than a couple of CATEs with a given sample. For example, if race is also and important factor in whether employees plan to leave Unter if Unter reduces their HR department, we would need to slice the data into smaller samples several more times, and run several distinct t.tests on the data. One of the major benefits of regression models is that we can use all of the data points to find an answer, but it also means our interpretations can be more open to debate. Let's see what a regression model tells us about the treatment effect in this experiment.
This exercise is part of the course
Causal Inference with R - Regression
Exercise instructions
- 1) Examine the structure of the dataframe
UnterHR
- 2) Look at summary statistics for each variable
- 3) Construct a regression model that measures the effect of
Treatment
onLeaveJob
, with a mediation effect forFemale
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# 1) Before running a regression model, let's examine the structure of the dataframe `UnterHR` with the str() command:
# 2) The dataset contains four variables: Treatment, Female, LeaveJob, and Race. Notice that the structure command refers to race as a "Factor" variable. This is a typical way that R identifies nominal (non-numeric) variables. In 4 separate statements, look at the summary statistics about each variable:
# 3) Remember that Unter found that that downsizing their HR department had a bigger CATE for women than for men, so let's construct a regression model that measures the effect of Treatment on LeaveJob, with a "mediation effect"" for Female. This is the same as "controlling" for Female, as we did for other independent variables in our previous examples. Then run the summary() command to further understand the results.
Solution3<-
summary()