1. Intro to ACF and PACF
Now you know how to fit ARIMA models and make forecasts, but how do we choose which ARIMA model to fit?
2. Motivation
The model order is very important to the quality of forecasts. Here we have fit different models to the same dataset and then made forecasts using each.
The mean predictions of the forecasts are shown as orange lines and you can see that they are very different.
3. ACF and PACF
One of the main ways to identify the correct model order is by using the autocorrelation function, the ACF, and the partial autocorrelation function the PACF.
4. What is the ACF
The autocorrelation function at lag-1 is the correlation between a time series and the same time series offset by one step.
The autocorrelation at lag-2 is the correlation between a time series and itself offset by two steps.
And so on.
When we talk about the autocorrelation function we mean the set of correlation values for different lags
5. What is the ACF
We can plot the autocorrelation function to get an overview of the data.
The bars show and ACF values at increasing lags. If these values are small and lie inside the blue shaded region, then they are not statistically significant.
6. What is the PACF
The partial autocorrelation is the correlation between a time series and the lagged version of itself after we subtract the effect of correlation at smaller lags. So it is the correlation associated with just that particular lag.
The partial autocorrelation function is this series of values and we can plot it to get another view of the data.
7. Using ACF and PACF to choose model order
By comparing the ACF and PACF for a time series we can deduce the model order.
If the amplitude of the ACF tails off with increasing lag and the PACF cuts off after some lag p, then we have a AR(p) model.
This plot is an AR(2) model
8. Using ACF and PACF to choose model order
If the amplitude of the ACF cuts off after some lag q and the amplitude of the PACF tails off then we have a MA(q) model.
This is an MA(2) model
9. Using ACF and PACF to choose model order
If both the ACF and PACF tail off then we have an ARMA model. In this case we can't deduce the model orders of p and q from the plot.
10. Using ACF and PACF to choose model order
You can refer to the following table when analyzing the ACF and PACF
11. Implementation in Python
In the statsmodel package there are two functions to make plots of the ACF and the PACF. These are plot-underscore-acf and plot-underscore-pacf functions. We import them like this.
To use them, we start by creating a figure with two subplots.
Into each function we pass the time series DataFrame and the maximum number of lags we would like to see. We also tell it whether to show the autocorrelation at lag-0. The ACF and PACF at lag-0 will always have a value of one so we'll set this argument to false to simplify the plot. Finally we pass it the axis to plot on.
The plot-pacf function works in the same way.
12. Implementation in Python
Here are the plots we generated.
13. Over/under differencing and ACF and PACF
The time series must be made stationary before making these plots.
If the ACF values are high and tail off very very slowly this is a sign that the data is non-stationarity, so it needs to be differenced.
14. Over/under differencing and ACF and PACF
If the autocorrelation at lag-1 is very negative this is a sign that we have taken the difference too many times.
15. Let's practice!
Now it's time to get down to some data. Let's practice!