Calculate R-squared
Now that you've calculated the RMSE of your model's predictions, you will examine how well the model fits the data: that is, how much variance does it explain. You can do this using \(R^2\).
Suppose \(y\) is the true outcome, \(p\) is the prediction from the model, and \(res = y - p\) are the residuals of the predictions.
Then the total sum of squares \(tss\) ("total variance") of the data is:
$$ tss = \sum{(y - \overline{y})^2} $$
where \(\overline{y}\) is the mean value of \(y\).
The residual sum of squared errors of the model, \(rss\) is: $$ rss = \sum{res^2} $$
\(R^2\) (R-squared), the "variance explained" by the model, is then:
$$ 1 - \frac{rss}{tss} $$
After you calculate \(R^2\), you will compare what you computed with the \(R^2\) reported by glance()
(docs). glance()
returns a one-row data frame; for a linear regression model, one of the columns returned is the \(R^2\) of the model on the training data.
The unemployment
data frame has been loaded for you, and contains the columns predictions
and residuals
that you calculated in a previous exercise. The unemployment_model
is also available for you to use.
This exercise is part of the course
Supervised Learning in R: Regression
Exercise instructions
- Calculate the mean
female_unemployment
and assign it to the variablefe_mean
. - Calculate the total sum of squares and assign it to the variable
tss
. - Calculate the residual sum of squares and assign it to the variable
rss
. - Calculate \(R^2\). Is it a good fit (\(R^2\) near 1)?
- Use
glance()
to get \(R^2\) from the model. Is it the same as what you calculated?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# unemployment is available
summary(unemployment)
# unemployment_model is available
summary(unemployment_model)
# Calculate and print the mean female_unemployment: fe_mean
(fe_mean <- ___)
# Calculate and print the total sum of squares: tss
(tss <- ___((___ - ___)^2))
# Calculate and print residual sum of squares: rss
(rss <- ___)
# Calculate and print the R-squared: rsq
(rsq <- ___)
# Get R-squared from glance and print it
(rsq_glance <- ___(___)$___)