1. Autocorrelation and Partial autocorrelation
Congratulations on getting this far! The last two chapters covered the basics of time series visualization and analysis, and you should now feel comfortable plotting and summarizing your time series data. In this chapter, you will learn how to extract and interpret patterns in time series data. You will discover the concepts of autocorrelation and partial autocorrelation, and learn how to detect and visualize seasonality, trend and noise in time series data.
2. Autocorrelation in time series data
Autocorrelation is a measure of the correlation between your time series and a delayed copy of itself. For example, an autocorrelation of order 3 returns the correlation between a time series at points t_1, t_2, t_3, and its own values lagged by 3 time points, i.e. t_4, t_5, t_6. Autocorrelation is used to find repeating patterns or periodic signals in time series data. The principle of autocorrelation can be applied to any signal, and not just time series. Therefore, it is common to encounter the same principle in other fields, where it is also sometimes referred to as autocovariance.
3. Statsmodels
In order to compute and plot the autocorrelation of a time series, we need to introduce a new Python library called statsmodels. As its documentation states, "statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration."
4. Plotting autocorrelations
We can leverage the dot plot_acf() function in statsmodels to measure and plot the autocorrelation of a time series. In the dot plot_acf() function, the maximum number of lags to compute the autocorrelation values can be specified by using the lags parameter. In this case, we set the lags parameter value to 40.
5. Interpreting autocorrelation plots
Because autocorrelation is a correlation measure, the autocorrelation coefficient can only take values between -1 and 1. An autocorrelation of 0 indicates no correlation, while 1 and -1 indicate strong negative and positive correlation.
In order to help you assess the significance of autocorrelation values, the dot plot_acf() function also computes and returns margins of uncertainty, which are represented in the graph as blue shaded regions. Values above these regions can be interpreted as the time series having a statistically significant relationship with a lagged version of itself.
6. Partial autocorrelation in time series data
Going beyond autocorrelation, the partial autocorrelation measures the correlation coefficient between a time-series and lagged versions of itself. However, it extends this idea by also removing the effect of previous time points. For example, a partial autocorrelation function of order 3 returns the correlation between our time series at points t_1 , t_2 , t_3 ,and lagged values of itself by 3 time points t_4, t_5, t_6, but only after removing all effects attributable to lags 1 and 2.
7. Plotting partial autocorrelations
Just like with autocorrelation, we need to use the statsmodels library to compute and plot the partial autocorrelation in a time series. This example uses the dot plot_pacf() function to calculate and plot the partial autocorrelation for the first 40 lags of the time series contained in the DataFrame df.
8. Interpreting partial autocorrelations plot
If partial autocorrelation values are close to 0, you can conclude that values are not correlated with one another. Inversely, partial autocorrelations that have values close to 1 or -1 indicate that there exists strong positive or negative correlations between the lagged observations of the time series. If partial autocorrelation values are beyond the margins of uncertainty, which are marked by the blue shaded regions, then you can assume that the observed partial autocorrelation values are statistically significant.
9. Let's practice!
Now that you have been introduced to the principles of autocorrelation and partial autocorrelation, let's put this into practice!