1. Introduction to time series and stationarity
Welcome to this course on forecasting using ARIMA models in Python. My name is James Fulton and I will be your guide as you learn how to predict the future of time series.
2. Motivation
Time series data is everywhere in this world. It is used in a wide variety of fields. There are many datasets for which we would like to be able to predict the future. Knowing the future of obesity rates could help us intervene now for public health; predicting consumer energy demands could help power stations run more efficiently; and predicting how the population of a city will change could help us build the infrastructure we will need.
3. Course content
We can forecast all of these datasets using time series models, and ARIMA models are one of the go-to time series tools.
You will learn how to fit these models and how to optimize them.
You will learn how to make forecasts of important real-world data, and importantly how to find the limits of your forecasts.
4. Loading and plotting
Let's start by examining a time series. We can load a time series from csv using pandas. Here we set the index as the date column and parse the date into datetime data-type.
5. Trend
To plot the data we make a pyplot figure and use the DataFrame's dot-plot method.
One important feature of a time series is its trend. A positive trend is a line that generally slopes up - the values increase with time. Similarly, a negative trend is where the values decrease.
6. Seasonality
Another important feature is seasonality. A seasonal time series has patterns that repeat at regular intervals, for example high sales every weekend.
7. Cyclicality
In contrast, cyclicality is where there is a repeating pattern but no fixed period.
8. White noise
White noise is an important concept in time series and ARIMA models. White noise is a series of measurements, where each value is uncorrelated with previous values.
You can think of this like flipping a coin, the outcome of a coin flip doesn't rely on the outcomes of coin flips that came before. Similarly, with white noise, the series value doesn't depend on the values that came before.
9. Stationarity
To model a time series, it must be stationary. Stationary means that the distribution of the data doesn't change with time. For a time series to be stationary it must fulfill three criteria. These are:
The series has zero trend, it isn't growing or shrinking
10. Stationarity
The variance is constant. The average distance of the data points from the zero line isn't changing
11. Stationarity
And the autocorrelation is constant. How each value in the time series is related to its neighbors stays the same.
12. Train-test split
Generally, in machine learning, you have a training set which you fit your model on, and a test set, which you will test your predictions against. Time series forecasting is just the same.
Our train-test split will be different however. We use the past values to make future predictions, and so we will need to split the data in time. We train on the data earlier in the time series and test on the data that comes later.
We can split time series at a given date as shown above using the DataFrame's dot-loc method.
13. Let's Practice!
We've learned the basics of stationarity and train-test splitting. Let's get used to these in practice.