1. ARIMA Time Series 101
If we are going to forecast our sales with time series modeling, we need to have a quick look at one of the foundational models of time series - ARIMA models.
Now you could spend an entire course on this.
2. Other DataCamp time series content
In fact, DataCamp has some really good courses on time series. If you are really interested in more details about these models I would highly recommend these courses. For now, I will give you a brief overview of ARIMA modeling in this video.
3. What is an ARIMA Model?
Let's quickly break down what we mean when we say ARIMA modeling.
AR stands for autoregressive. MA stands for moving average. I stands for integrated. Let's actually start with the integrated piece first.
4. Integrated - Stationarity
In time series models we typically assume dependency across time. Otherwise, why would we care about time at all? The big questions are if this dependency exists and how long does it stays.
A rather crude definition of stationarity is when effects in a data set dissipate as time goes on. What happens today has less and less effect on the data the further we get from today.
The best long-term predictions for data that has stationarity is the historical mean of the series. A historical average wouldn't be a good prediction for something with a wave or something always trending up.
How do we make our data have stationarity? Typically, through differencing.
5. Differencing
Differencing your data means looking at the change from one time period to another or the DIFFERENCE between them.
This can solve trending data sets with a single time period difference. It can also solve seasonal effects with seasonal differences. For example, monthly sales data might have an annual (or 12 period) seasonal wave.
6. Autoregressive (AR) Piece
Once you have stationary data you can move to the other pieces of the ARIMA model.
Autoregressive models deal with previous values of your data. For example, last month's sales have some residual effect on what sales look like this month. These previous values are called lags and you can have any number of them in your model. They are called long-memory models because these effects slowly dissipate across time.
7. Moving Average (MA) Piece
Moving average models deal with previous, hidden "shocks" or errors in your model. This is harder to conceptualize. Essentially, how "abnormal" your previous value was compared to what was predicted last month has some residual effect on this month's sales. They are called short-memory models because these effects quickly disappear completely.
8. Training vs. Validation
Before we model we still need to split our data into training and validation. It is always good practice to compare our predictions against real data to see how good our model REALLY is.
9. Training vs. Validation
We are going to combine both products in the mountain region - high end and low end - into total mountain sales. Next, we are just splitting off the 2017 data for our sales in the mountain region for validation.
10. How to Build ARIMA Models?
Once we have our training data, R has a wonderful function called auto.arima(). This function will try to estimate the ARIMA model you need for your data. As you can see here, the function tells us all the pieces of the ARIMA model. It says that the AR piece (the first piece) has four previous values of Y that should be in the model. It then says that there is no differencing necessary for the data - the second piece is zero. Finally, there is one moving average piece to our model as well. If you are really interested, the coefficients for these are listed as well.
11. Let's practice!
Wow! That was a lot to learn about time series really quickly. Let's solidify those concepts.