Get startedGet started for free

R-squared

1. R-squared ($R^2$)

In this lesson you will learn about another key metric for evaluating regression models: R-squared.

2. What is $R^2$?

The R-squared is a measure of how well a model fits the data. It is a value between 0 and 1. Values near 1 indicate a model that fits the data well; values near 0 indicate a model that is no better than always predicting the average value of the data.

3. Calculating $R^2$

R-squared is defined as 1 minus the ratio of the residual sum of squares to the total sum of squares. Since the total sum of squares is the variance of the data, R-squared is sometimes called the variance explained by the model. The residual sum of squares is the sum of the squared error of the model's predictions. If the RSS is small compared to the total sum of squares, then R-squared is large, and the model fits the data well.

4. Calculate $R^2$ of the House Price Model: RSS

Let’s calculate the R-squared of the house price model on its training data. Again, price is the column of actual sale prices and prediction is the column of predicted sale prices. First, calculate the error between the prices and the predictions. Then, take the square of the errors and sum them up. This is the residual sum of squares.

5. Calculate $R^2$ of the House Price Model: $SS_{Tot}$

To calculate the total sum of squares, subtract the mean home price from all the prices, square it and take the sum.

6. Calculate $R^2$ of the House Price Model

Finally, take the ratio of the residual sum of squares to the total sum of squares and subtract it from 1. This is the r-squared.

7. Reading $R^2$ from the lm() model

For lm models, you can read the r-squared from the summary of the model, as we show here. Calling glance on an lm model will also return the r-squared, and other diagnostics, in a data frame. However, not all regression algorithms in R return R-squared, so it’s good to know how to calculate it yourself.

8. Correlation and $R^2$

For models that minimize squared error (like linear regression), the R-squared is the square of the correlation between the outcome and the prediction.

9. Correlation and $R^2$

I like this definition, because it says intuitively what you want to be true of a model: that the predictions and the true outcome are correlated. Note that this correspondence will only be true for the data the model was trained on, not on new data.

10. Let's practice!

Now let’s practice fitting models and calculating R-squared.