Get startedGet started for free

Calculate R-squared

Now that you've calculated the RMSE of your model's predictions, you will examine how well the model fits the data: that is, how much variance does it explain. You can do this using \(R^2\).

Suppose \(y\) is the true outcome, \(p\) is the prediction from the model, and \(res = y - p\) are the residuals of the predictions.

Then the total sum of squares \(tss\) ("total variance") of the data is:

$$ tss = \sum{(y - \overline{y})^2} $$

where \(\overline{y}\) is the mean value of \(y\).

The residual sum of squared errors of the model, \(rss\) is: $$ rss = \sum{res^2} $$

\(R^2\) (R-squared), the "variance explained" by the model, is then:

$$ 1 - \frac{rss}{tss} $$

After you calculate \(R^2\), you will compare what you computed with the \(R^2\) reported by glance() (docs). glance() returns a one-row data frame; for a linear regression model, one of the columns returned is the \(R^2\) of the model on the training data.

The unemployment data frame has been loaded for you, and contains the columns predictions and residuals that you calculated in a previous exercise. The unemployment_model is also available for you to use.

This exercise is part of the course

Supervised Learning in R: Regression

View Course

Exercise instructions

  • Calculate the mean female_unemployment and assign it to the variable fe_mean.
  • Calculate the total sum of squares and assign it to the variable tss.
  • Calculate the residual sum of squares and assign it to the variable rss.
  • Calculate \(R^2\). Is it a good fit (\(R^2\) near 1)?
  • Use glance() to get \(R^2\) from the model. Is it the same as what you calculated?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# unemployment is available
summary(unemployment)

# unemployment_model is available
summary(unemployment_model)

# Calculate and print the mean female_unemployment: fe_mean
(fe_mean <- ___)

# Calculate and print the total sum of squares: tss
(tss <- ___((___ - ___)^2))

# Calculate and print residual sum of squares: rss
(rss <- ___)

# Calculate and print the R-squared: rsq
(rsq <- ___)

# Get R-squared from glance and print it
(rsq_glance <- ___(___)$___)
Edit and Run Code