Model diagnostics

1. Model diagnostics

You've come a long way, but our work isn't finished once we have built the model. The next step is using common model diagnostics to confirm our model is behaving well.

2. Introduction to model diagnostics

After we have picked a final model or a final few models we should ask how good they are. This is a key part of the model building life cycle.

3. Residuals

To diagnose our model we focus on the residuals to the training data. The residuals are the difference between the our model's one-step-ahead predictions and the real values of the time series.

4. Residuals

In statsmodels the residuals over the training period can be accessed using the dot-resid attribute of the results object. These are stored as a pandas series.

5. Mean absolute error

We might like to know, on average, how large the residuals are and so how far our predictions are from the true values. To answer this we can calculate the mean absolute error of the residuals. We can do this in Python using the numpy-dot-abs and the numpy-dot-mean functions.

6. Plot diagnostics

For an ideal model the residuals should be uncorrelated white Gaussian noise centered on zero. The rest of our diagnostics will help us to see if this is true. We can use the results object's dot-plot-underscore-diagnostics method to generate four common plots for evaluating this. These are shown on the right.

7. Residuals plot

One of the four plots shows the one-step-ahead standardized residuals. If our model is working correctly, there should be no obvious structure in the residuals.

8. Residuals plot

Here the plot on the left has no obvious pattern, but the plot on the right does.

9. Histogram plus estimated density

Another of the four plots, shows us the distribution of the residuals. The histogram shows us the measured distribution; the orange line shows a smoothed version of this histogram; and the green line, shows a normal distribution. If our model is good these two lines should be almost the same. Here, the plot on the left looks fine, but the plot on the right doesn't.

10. Normal Q-Q

The normal Q-Q plot is another way to show how the distribution of the model residuals compares to a normal distribution. If our residuals are normally distributed then all the points should lie along the red line, except perhaps some values at either end.

11. Correlogram

The last plot is the correlogram, which is just an ACF plot of the residuals rather than the data. 95% of the correlations for lag greater than zero should not be significant. If there is significant correlation in the residuals, it means that there is information in the data that our model hasn't captured.

12. Summary statistics

Some of these plots also have accompanying test statistics in results dot-summary tables. Prob(Q) is the p-value associated with the null hypothesis that the residuals have no correlation structure. Prob(JB) is the p-value associated with the null hypothesis that the residuals are Gaussian normally distributed. If either p-value is less than 0.05 we reject that hypothesis.

13. Let's practice!

Now let's use some of these new tools in practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

ARIMA Models in Python

AdvancedSkill Level

4.9+

225 reviews