1. Visualizing model fit
There are several plots that can quantify the performance of a model. We'll look at these plots and their interpretation first, then the code to draw them.
2. Hoped for properties of residuals
If a linear regression model is a good fit, then the residuals are approximately normally distributed, with mean zero.
3. Bream and perch again
Earlier we ran models on the bream and perch datasets. From looking at the scatter plots with linear trend lines, it appeared that the bream model was a reasonable fit, but the perch model wasn't because the observed masses increased faster than linearly with the lengths.
4. Residuals vs. fitted values
The first diagnostic plot is of residuals versus fitted values. The blue line is a LOESS trend line, which is a smooth curve following the data. These aren't good for making predictions with, but are useful for visualizing trends.
If residuals met the assumption that they are normally distributed with mean zero, then the trend line should closely follow the y equals zero line on the plot. For the bream dataset, this is true.
By contrast, the perch model doesn't meet the assumption. The residuals are above zero when the fitted value is small or big, and below zero in the middle.
5. Q-Q plot
The second plot type is called a Q-Q plot. It shows whether or not the residuals follow a normal distribution.
On the x-axis, the points are quantiles from the normal distribution. On the y-axis, you get the standardized residuals, which are the residuals divided by their standard deviation.
It sounds technical, but interpreting this plot is easy. If the points track along the straight line, they are normally distributed. If not, they aren't.
Here, most of the bream points follow the line closely. Two points at each extreme don't follow the line. These are labelled 14 and 30, which correspond to the row of the bream dataset where the bad residuals occur.
The perch dataset doesn't track the line as closely. In particular, you can see on the right hand side of the plot that the residuals are larger than expected. That means the model is a particularly poor fit for the longer lengths of perch.
6. Scale-location
The third plot shows the square root of the standardized residuals versus the fitted values. It's often called a scale-location plot, because that's easier to say.
Where the first plot showed whether or not the residuals go positive or negative as the fitted values change, this plot shows whether the size of the residuals gets bigger or smaller.
The residuals for the bream dataset get a little bigger as the fitted values increase, but it's not a huge change.
Again, the plot of the perch model has a trend line that goes up and down all over the place, indicating a poor fit.
7. autoplot()
Drawing the plots is straightforward. Load the ggplot2 and ggfortify packages, then call autoplot, passing the model object.
The only tricky part is the which argument. That tells you which of the plots to draw. Unfortunately, they have numbers instead of descriptive names, so you have to read the documentation to remember which number corresponds to which plot. Fortunately, the argument is vectorized so you can draw all the plots at once.
8. autoplot() with the perch model
Here's an example with the perch model. which is set to one to three, so all the plots are drawn. nrow and ncol determine the layout of the plots.
9. 'Autoplots, roll out!' -Plotimus Prime
Autoplots, roll out!