Session Ready
Exercise

Log-odds scale

Previously, we considered two formulations of logistic regression models:

  • on the probability scale, the units are easy to interpret, but the function is non-linear, which makes it hard to understand
  • on the odds scale, the units are harder (but not impossible) to interpret, and the function in exponential, which makes it harder (but not impossible) to interpret

We'll now add a third formulation:

  • on the log-odds scale, the units are nearly impossible to interpret, but the function is linear, which makes it easy to understand

As you can see, none of these three is uniformly superior. Most people tend to interpret the fitted values on the probability scale and the function on the log-odds scale. The interpretation of the coefficients is most commonly done on the odds scale. Recall that we interpreted our slope coefficient \(\beta_1\) in linear regression as the expected change in \(y\) given a one unit change in \(x\). On the probability scale, the function is non-linear and so this approach won't work. On the log-odds, the function is linear, but the units are not interpretable (what does the \(\log\) of the odds mean??). However, on the odds scale, a one unit change in \(x\) leads to the odds being multiplied by a factor of \(\beta_1\). To see why, we form the odds ratio:

$$ OR = \frac{odds(\hat{y} | x + 1 )}{ odds(\hat{y} | x )} = \exp{\beta_1} $$

Thus, the exponentiated coefficent \(\beta_1\) tells us how the expected odds change for a one unit increase in the explanatory variable. It is tempting to interpret this as a change in the expected probability, but this is wrong and can lead to nonsensical predictions (e.g. expected probabilities greater than 1).

Instructions
100 XP
  • Add a variable called log_odds to MedGPA_binned that records the odds of being accepted for each bin. Recall that \(odds(p) = p / (1-p)\).
  • Create a scatterplot called data_space for log_odds as a function of mean_GPA using the binned data in MedGPA_binned. Use geom_line to connect the points.
  • Add a variable called log_odds_hat to MedGPA_plus that records the predicted odds of being accepted for each observation.
  • Use geom_line() to illustrate the model through the fitted values. Note that you should be plotting the \(\log{\widehat{odds}}\)'s.