Exercise

# R-squared vs. adjusted R-squared

Two common measures of how well a model fits to data are \(R^2\) (the coefficient of determination) and the adjusted \(R^2\). The former measures the percentage of the variability in the response variable that is explained by the model. To compute this, we define
$$
R^2 = 1 - \frac{SSE}{SST} \,,
$$
where \(SSE\) and \(SST\) are the sum of the squared residuals, and the total sum of the squares, respectively. One issue with this measure is that the \(SSE\) can only decrease as new variable are added to the model, while the \(SST\) depends only on the response variable and therefore is not affected by changes to the model. This means that you can increase \(R^2\) by adding *any* additional variable to your model—even random noise.

The adjusted \(R^2\) includes a term that penalizes a model for each additional explanatory variable (where \(p\) is the number of explanatory variables). $$ R^2_{adj} = 1 - \frac{SSE}{SST} \cdot \frac{n-1}{n-p-1} \,, $$

We can see both measures in the output of the `summary()`

function on our model object.

Instructions

**100 XP**

- Use
`summary()`

to compute \(R^2\) and adjusted \(R^2\) on the model object called`mod`

. - Use
`mutate()`

and`rnorm()`

to add a new variable called`noise`

to the`mario_kart`

data set that consists of random noise. Save the new dataframe as`mario_kart_noisy`

. - Use
`lm()`

to fit a model that includes`wheels`

,`cond`

, and the random noise term. - Use
`summary()`

to compute \(R^2\) and adjusted \(R^2\) on the new model object. Did the value of \(R^2\) increase? What about adjusted \(R^2\)?