Get startedGet started for free

Goodness-of-Fit

1. Goodness-of-Fit

Previously in the course, we saw how to use RSS and least-squares to fit a model to data, by finding the model parameter values which gave the minimum RSS. In this lesson, we will explore methods of quantifying the "goodness-of-fit" for a model, and show how they are different than but related to RSS.

2. 3 Different R's

There are three related and easily confused "R's" in linear models: RSS, RMSE, and R-squared. RSS was used to help you find the optimal values for model parameters, and thus, the best model. But even the best model will still have non-zero residuals. SO if the fit is not perfect, how "good" is it? There are two common ways to quantify the goodness-of-fit for a linear model: RMSE and R-squared. In this lesson, you will compute both.

3. RMSE

RMSE is the most common way to quantify the goodness-of-fit. To compute RMSE, we start with the residuals, and compute RSS as before. Dividing the RSS by number of residuals, as if to normalize, we have the *mean* of the squared residuals, rather than the sum. Here, the residuals can be thought of as modeling "errors". so we call this the mean-squared-error, or MSE. If we take the square-root of MSE we arrive at RMSE, the root mean squared error. If you recall the previous lesson on variance, you'll see that MSE has the form of a variance of the residuals, and RMSE is the normalized variance, or standard deviation of the residuals. Think of RMSE as providing a measure of how much the model "deviates" from the data.

4. R-Squared in Code

More than goodness-of-fit, we'd like to know how much of the variation in the data is due to the linear trend and how much is not. R-squared is a quantitative measure of just that ratio. Let's see how R-squared is related to things you've already seen. First, recall that DEVIATIONS are the difference between the data points and the data mean. If you square and then sum the deviations, you get the variance. VAR captures all variation in the data, both the linear trend and the randomness. Next, recall that RESIDUALS are the difference between the data points and the MODEL. If you square and sum the residuals, you get RSS. RSS only captures the variations "left-over" after we subtract out the modeled linear trend. R-squared is 1 minus the ratio of VAR divided by RSS. It can also be computed as the correlation of the data and the model.

5. R-Squared in Data

Let's see how RMSE and R-squared compare in an series of examples. We compare the 4 data sets and fit a model to each. The first has zero slope. R-squared is 0. RMSE is 667.

6. R-Squared in Data

For the second, the slope increases, and R-squared increases to 0.21, but RMSE is unchanged.

7. R-Squared in Data

Third, slope increases again, R-squared increases to 0.89, but RMSE is still unchanged.

8. R-Squared in Data

Finally, slope is steepest, R-squared has increased to 0.97, and RMSE remains unchanged.

9. RMSE vs R-Squared

When the variation due to linear trend is larger than the variation due to residuals, the model is better. The effect is real and practical. The randomness of the residuals can completely mask the linear dependence for small slope, but be relatively unimportant for large slope. R-squared captures this effect, but RMSE does not.

10. Let's practice!

In the exercises that follow, let's see a few examples in context, computing RMSE and R-squared, with both statsmodels and scikit-learn.