Get startedGet started for free

Practice Computing Regression Discontinuity Effects

Let's continue with the NBA research example. On your workspace is a simulated data set, called NBA, that closely resembles the data analyzed by Berger and Pope. The data frame NBA contains game characteristics for over 18,000 NBA games between 1994 and 2009. Will you get the same results as they did? Let's find out!

The outcome variable we are interested in is called home.team.final.margin, which is the final margin of victory (or loss) for the home team. The running variable in this RDD is called home.team.halftime.margin, which records the scoring difference between the home and visiting teams at halftime. Let's keep things simple and take a look at when the home team is winning at halftime, and define our treatment variable as the home team being ahead at halftime, i.e. when home.team.halftime.margin > 0.

In this exercise, you will compute the treatment effect of being ahead at halftime on the final margin of victory. You will use regression methods as well as nonparametric methods to assess how robust the effect is.

This exercise is part of the course

Causal Inference with R - Instrumental Variables & RDD

View Course

Exercise instructions

  • 1) Use OLS regression to estimate the treatment effect of being behind at halftime on the final margin of victory under two different parametric scenarios
  • 2) Estimate the treatment effect using non-parametric methods

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Before running a regression model, let's examine the data:
  
    str(NBA)

# The dataset contains eight variables: a game identifier, a season identifier, two team identifiers (for home and visitor), the quality of the home and visiting teams (a normalized verison of win percentage), and the halftime and end-of-game margins of victory for the home team.

# 1)  Let's start by creating a dummy variable that equals 1 if the home team is ahead at halftime, then store those results into the variable `home.team.winning.at.half` in the dataframe `NBA`. Note that a logical expression like `x > y` in R will return a value of `true` or `false` by default, but you can convert this into a numeric variable with a value of 1 or 0 by wrapping your variable definition in the `as.numeric()` function, like so: as.numeric(x > y). Enter the proper syntax between the () provided below to create the dummy variable when the home team has a positive point margin at halftime:

    NBA$home.team.winning.at.half <- as.numeric()


# 2) Let's assume that the effect of the halftime margin on the final margin is a linear one. Use the lm function to compute the RD estimate of being ahead at halftime on the final margin of victory. Include the following as additional controls in the regression, which are each separate variables listed in the str() results: home team's halftime margin, home team quality, and away team quality.

    summary(lm())

# Good. Here we tried using a linear model, but what if that effect is nonlinear?

# 3) Let's see what happens to our effect if we allow our model to handle a quadratic effect of halftime margin on the final margin. We can do this by adding I(home.team.halftime.margin^2) as an additional term in the model statement of lm.

    summary(lm())

  
# 4) Nice going. But what if the effect is nonlinear and also not really quadratic? As a last exercise, let's further relax the quadratic assumption of halftime margin. Instead of estimating an OLS regression using the lm function, we will now use a package called 'rdd' which gives us more options to test the sensitivity of the regression assumptions for our RD estimate.

# In this case, the function we will use is called RDestimate. Without getting into too many details, we will look at one of its functions that provides a nonparametric model that we can compare to our linear and quadratic models. Run the following syntax as your answer to Question 4, and compare the results to the prior estimate results:

    summary(RDestimate(home.team.final.margin ~ home.team.halftime.margin | home.team.qual+away.team.qual,data=NBA))


# 5) Now compare the results of our three causal effect estimates: the linear model and our two nonlinear models (with quadratic and nonparametric functions). Which kind of estimate do you think most closely captures the true causal effect of being behind (or ahead) at halftime on the final margin of victory? Write in "linear" or "nonlinear" as the answer to Solution5.

    Solution5<-""
Edit and Run Code