ARIMA models

1. ARIMA models

ARIMA stands for Autoregressive Integrated Moving Average Models. Let's break that down into its parts.

2. ARIMA models

First, an Autoregressive model is simply a regression of a time series against the lagged values of that series. The last p observations are used as predictors in the regression equation. A Moving Average model can also be thought of as a regression. But instead of regressing against lagged observations, we regress against lagged errors. The last q errors are used as predictors in the equation.

3. ARIMA models

When we put these together we have an ARMA model, where the last p observations and the last q errors are all used as predictors in the equation. ARMA models can only work with stationary data. So we need to difference the data first. That brings me to the I in ARIMA which stands for Integrated. That is the opposite of differencing. If our time series needs to be differenced d times to make it stationary, then the resulting model is called an ARIMA(p,d,q) model. So to apply an ARIMA model to data, we need to decide on the value of p, d and q, and whether or not to include the constant c (the intercept in these equations). Fortunately, there is an automated procedure to do that.

4. US net electricity generation

Let's apply it to these data on annual US net electricity generation. From your experience of using differencing, you would probably guess this series needs one difference to make it stationary.

5. US net electricity generation

The auto-dot-arima function chooses the ARIMA model given the time series. In this case, it has selected an ARIMA(2,1,2) model with drift. So the data has been differenced once, and then 2 past observations and 2 past errors have been used in the equation. The drift here refers to the coefficient c. It is called a drift coefficient when there is differencing. The rest of the output tells you about the value of the parameters, and other model information. Notice that the AICc value is given, just as it was for ETS models. auto-dot-arima is selecting the values of p and q by minimizing the AICc value, just like the ets function did. However, you cannot compare an ARIMA AICc value with an ETS AICc value. You can only compare AICc values between models of the same class. You also can't compare AICc values between models with different amounts of differencing.

6. US net electricity generation

The resulting forecasts look pretty good don't they? The upward trend has been captured nicely.

7. How does auto.arima() work?

auto-dot-arima is using an algorithm I developed with Yeasmin Khandakar. It chooses the number of differences using a tool called a unit root test, and it selects the values of p and q by minimizing the AICc. The parameters are estimated using maximum likelihood estimation. One issue here is the model space is very large as p and q can take any non-negative values, so to save time we only try some of the possible models. That means it is possible that auto-dot-arima returns a model that is not actually the one with the minimum AICc value. Sometimes you can beat it, as you will see in a later exercise. There is a whole DataCamp course on ARIMA models, which I recommend you take if you want to delve deeper into this class of models.

8. Let's practice!

For now, let's get started using auto-dot-arima to do the work for us.

This exercise is part of the course

Forecasting in R

IntermediateSkill Level

4.9+

Start Course for Free

The first thing to do in any data analysis task is to plot the data. Graphs enable many features of the data to be visualized, including patterns, unusual observations, and changes over time. The features that are seen in plots of the data must then be incorporated, as far as possible, into the forecasting methods to be used.

Exercise 1: Welcome to Forecasting Using R Exercise 2: Creating time series objects in R Exercise 3: Time series plots Exercise 4: Seasonal plots Exercise 5: Trends, seasonality, and cyclicity Exercise 6: Autocorrelation of non-seasonal time series Exercise 7: Autocorrelation of seasonal and cyclic time series Exercise 8: Match the ACF to the time series Exercise 9: White noise Exercise 10: Stock prices and white noise

In this chapter, you will learn general tools that are useful for many different forecasting situations. It will describe some methods for benchmark forecasting, methods for checking whether a forecasting method has adequately utilized the available information, and methods for measuring forecast accuracy. Each of the tools discussed in this chapter will be used repeatedly in subsequent chapters as you develop and explore a range of forecasting methods.

Exercise 1: Forecasts and potential futures Exercise 2: Naive forecasting methods Exercise 3: Fitted values and residuals Exercise 4: Checking time series residuals Exercise 5: Training and test sets Exercise 6: Evaluating forecast accuracy of non-seasonal methods Exercise 7: Evaluating forecast accuracy of seasonal methods Exercise 8: Do I have a good forecasting model?Exercise 9: Time series cross-validation Exercise 10: Using tsCV() for time series cross-validation

Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation, the higher the associated weight. This framework generates reliable forecasts quickly and for a wide range of time series, which is a great advantage and of major importance to applications in business.

Exercise 1: Exponentially weighted forecasts Exercise 2: Simple exponential smoothing Exercise 3: SES vs naive Exercise 4: Exponential smoothing methods with trend Exercise 5: Holt's trend methods Exercise 6: Exponential smoothing methods with trend and seasonality Exercise 7: Holt-Winters with monthly data Exercise 8: Holt-Winters method with daily data Exercise 9: State space models for exponential smoothing Exercise 10: Automatic forecasting with exponential smoothing Exercise 11: ETS vs seasonal naive Exercise 12: Match the models to the time series Exercise 13: When does ETS fail?

ARIMA models provide another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.

Exercise 1: Transformations for variance stabilization Exercise 2: Box-Cox transformations for time series Exercise 3: Non-seasonal differencing for stationarity Exercise 4: Seasonal differencing for stationarity Exercise 5: ARIMA models

Current Exercise

Exercise 6: Automatic ARIMA models for non-seasonal time series Exercise 7: Forecasting with ARIMA models Exercise 8: Comparing auto.arima() and ets() on non-seasonal data Exercise 9: Seasonal ARIMA models Exercise 10: Automatic ARIMA models for seasonal time series Exercise 11: Exploring auto.arima() options Exercise 12: Comparing auto.arima() and ets() on seasonal data

The time series models in the previous chapters work well for many time series, but they are often not good for weekly or hourly data, and they do not allow for the inclusion of other information such as the effects of holidays, competitor activity, changes in the law, etc. In this chapter, you will look at some methods that handle more complicated seasonality, and you consider how to extend ARIMA models in order to allow other information to be included in the them.

Exercise 1: Dynamic regression Exercise 2: Forecasting sales allowing for advertising expenditure Exercise 3: Forecasting electricity demand Exercise 4: Dynamic harmonic regression Exercise 5: Forecasting weekly data Exercise 6: Harmonic regression for multiple seasonality Exercise 7: Forecasting call bookings Exercise 8: TBATS models Exercise 9: TBATS models for electricity demand Exercise 10: Your future in forecasting!