Exercise

# Log-odds scale

Previously, we considered two formulations of logistic regression models:

- on the probability scale, the units are easy to interpret, but the function is non-linear, which makes it hard to understand
- on the odds scale, the units are harder (but not impossible) to interpret, and the function in exponential, which makes it harder (but not impossible) to interpret

We'll now add a third formulation:

- on the log-odds scale, the units are nearly impossible to interpret, but the function is linear, which makes it easy to understand

As you can see, none of these three is uniformly superior. Most people tend to interpret the fitted values on the probability scale and the function on the log-odds scale. The interpretation of the coefficients is most commonly done on the odds scale. Recall that we interpreted our slope coefficient \(\beta_1\) in *linear* regression as the expected change in \(y\) given a one unit change in \(x\). On the probability scale, the function is non-linear and so this approach won't work. On the log-odds, the function is linear, but the units are not interpretable (what does the \(\log\) of the odds mean??). However, on the odds scale, a one unit change in \(x\) leads to the odds being multiplied by a factor of \(\beta_1\). To see why, we form the **odds ratio**:

$$ OR = \frac{odds(\hat{y} | x + 1 )}{ odds(\hat{y} | x )} = \exp{\beta_1} $$

Thus, the exponentiated coefficent \(\beta_1\) tells us how the expected *odds* change for a one unit increase in the explanatory variable. It is tempting to interpret this as a change in the expected *probability*, but this is wrong and can lead to nonsensical predictions (e.g. expected probabilities greater than 1).

Instructions

**100 XP**

- Add a variable called
`log_odds`

to`MedGPA_binned`

that records the odds of being accepted for each bin. Recall that \(odds(p) = p / (1-p)\). - Create a scatterplot called
`data_space`

for`log_odds`

as a function of`mean_GPA`

using the binned data in`MedGPA_binned`

. Use`geom_line`

to connect the points. - Add a variable called
`log_odds_hat`

to`MedGPA_plus`

that records the predicted odds of being accepted for each observation. - Use
`geom_line()`

to illustrate the model through the fitted values. Note that you should be plotting the \(\log{\widehat{odds}}\)'s.