Session Ready
Exercise

R-squared vs. adjusted R-squared

Two common measures of how well a model fits to data are \(R^2\) (the coefficient of determination) and the adjusted \(R^2\). The former measures the percentage of the variability in the response variable that is explained by the model. To compute this, we define $$ R^2 = 1 - \frac{SSE}{SST} \,, $$ where \(SSE\) and \(SST\) are the sum of the squared residuals, and the total sum of the squares, respectively. One issue with this measure is that the \(SSE\) can only decrease as new variable are added to the model, while the \(SST\) depends only on the response variable and therefore is not affected by changes to the model. This means that you can increase \(R^2\) by adding any additional variable to your model—even random noise.

The adjusted \(R^2\) includes a term that penalizes a model for each additional explanatory variable (where \(p\) is the number of explanatory variables). $$ R^2_{adj} = 1 - \frac{SSE}{SST} \cdot \frac{n-1}{n-p-1} \,, $$

We can see both measures in the output of the summary() function on our model object.

Instructions
100 XP
  • Use summary() to compute \(R^2\) and adjusted \(R^2\) on the model object called mod.
  • Use mutate() and rnorm() to add a new variable called noise to the mario_kart data set that consists of random noise. Save the new dataframe as mario_kart_noisy.
  • Use lm() to fit a model that includes wheels, cond, and the random noise term.
  • Use summary() to compute \(R^2\) and adjusted \(R^2\) on the new model object. Did the value of \(R^2\) increase? What about adjusted \(R^2\)?