Session Ready
Exercise

Fitting a line to a binary response

When our response variable is binary, a regression model has several limitations. Among the more obvious—and logically incongruous—is that the regression line extends infinitely in either direction. This means that even though our response variable \(y\) only takes on the values 0 and 1, our fitted values \(\hat{y}\) can range anywhere from \(-\infty\) to \(\infty\). This doesn't make sense.

To see this in action, we'll fit a linear regression model to data about 55 students who applied to medical school. We want to understand how their undergraduate \(GPA\) relates to the probability they will be accepted by a particular school \((Acceptance)\).

Instructions
100 XP

The medical school acceptance data is loaded in your workspace as MedGPA.

  • Create a scatterplot called data_space for Acceptance as a function of GPA. Use geom_jitter() to apply a small amount of jitter to the points in the \(y\)-direction by setting width = 0 and height = 0.05.
  • Use geom_smooth() to add the simple linear regression line to data_space.