Session Ready
Exercise

Calculate R-Squared

Now that you've calculated the RMSE of your model's predictions, you will examine how well the model fits the data: that is, how much variance does it explain. You can do this using \(R^2\).

Suppose \(y\) is the true outcome, \(p\) is the prediction from the model, and \(res = y - p\) are the residuals of the predictions.

Then the total sum of squares \(tss\) ("total variance") of the data is:

$$ tss = \sum{(y - \overline{y})^2} $$

where \(\overline{y}\) is the mean value of \(y\).

The residual sum of squared errors of the model, \(rss\) is: $$ rss = \sum{res^2} $$

\(R^2\) (R-Squared), the "variance explained" by the model, is then:

$$ 1 - \frac{rss}{tss} $$

After you calculate \(R^2\), you will compare what you computed with the \(R^2\) reported by glance(). glance() returns a one-row data frame; for a linear regression model, one of the columns returned is the \(R^2\) of the model on the training data.

The data frame unemployment is in your workspace, with the columns predictions and residuals that you calculated in a previous exercise.

Instructions
100 XP

The data frame unemployment and the model unemployment_model are in the workspace.

  • Calculate the mean female_unemployment and assign it to the variable fe_mean.
  • Calculate the total sum of squares and assign it to the variable tss.
  • Calculate the residual sum of squares and assign it to the variable rss.
  • Calculate \(R^2\). Is it a good fit (\(R^2\) near 1)?
  • Use glance() to get \(R^2\) from the model. Is it the same as what you calculated?