Get startedGet started for free

Communication Skills in Video Games: Propensity Score Matching in R

The researchers studying how playing NERD affects communication skills knew their sample was highly unbalanced, so they thought that matching techniques may be required. With the dataset, NERD, use matching techniques to better estimate the average treatment effect of playing NERD on communication skills. Regression models alone aren't always convincing for measuring causal effects in unbalanced data (even under unconfoundedness). A more robust way to test the effect of our treatment on communication skills is through matching methods. Matching methods balance the treatment group with the control group so that they are more identical.

In R, the best tool for doing matching is the "MatchIt" package. Let's use the MatchIt package to subset our NERD dataset so that our control group contains observations that are most similar to those in our treatment group. There are many methods for matching data, but in this question, we use MatchIt's default methods.

This exercise is part of the course

Causal Inference with R - Regression

View Course

Exercise instructions

  • 1) Build a model for Treatment based on all of the control variables.
  • 2) Subset our data to just the units who are likely to be in the treatment group.
  • 3) Use matching techniques to balance the dataset.
  • 4) Estimate standard OLS regression model for communication skills (Communication) based on all other variables in the matched dataset (match.NERD).
  • 5) Check the statistical significance of the regression on the matched data.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Note: You may want to refresh your memory of the variables in `NERD` with the str(), head(), or summary() commands).

# 1) Let's use the tools in the MatchIt package for R to help us try a Propensity Score approach to matching units in our treatment and control groups. The following syntax builds a model for estimating "Treatment" by all other control variables in our dataset `NERD` with the matchit() command: matchit(Treatment ~ control1 + control2 + control3 + control4 + control5, data = NERD). Use that syntax to populate the dataframe `match.it`:
    
    match.it <-


# 2) Now let's subset the NERD dataset to observations that have a high predicted probability for being in the treatment group. We have generated that code for you, so select the following code and hit the "Run Code" button:

	matched.NERD <- match.data(match.it)[1:ncol(NERD)]


# 3) Let's use the summary() command on our new datafame `matched.NERD`: 

	summary()
    
# The "Summary of balance for all data" shows mean values of each variable across our original treatment and control groups, and the "Summary of balance for matched data" shows mean values of each variable across our matched dataset. You may notice that the mean values for most variables between the treatment and control groups are now pretty similar. The "Sample sizes" output shows that the algorithm matched one observation in the control group for each observation in the treatment group.
    
    
# 4) Let's now estimate a generalized linear regression model for communication skills that uses all of our control variables and our matched dataset (matched.NERD). 

      Solution4<-glm()


# 5) Use the summary command on Solution4 to check its statistical significant. Is the effect of "Treatment" in Solution4 both positive and statistically significant?  Answer with "Yes" or "No".
    
      summary(Solution4)
      Solution5<-""

#Note: For the sake of comparison, the syntax below shows us our regression results when our data were unmatched. Notice that the treatment effect was positive.
      summary(FirstModel)
      
     
Edit and Run Code