1. Training and test sets
To really test how good your forecasting method is, there is no substitute for actually using it to forecast future observations.
2. Training and test sets
You probably don't want to wait around for a few years to see how it goes.
3. Training and test sets
Instead, you can hide some observations at the end of the series, then try to forecast them. No peeking!
4. Training and test sets
If you look at the hidden observations, and that influences your forecasts, then it is not a fair test.
The observations used to build your forecasts are called the "training set". The remaining hidden observations form the test set.
It is easy to build a very complicated model on the training data that has tiny residuals and looks like a great fit. But it produces terrible forecasts. This is called over-fitting. Checking the forecast performance on the test set helps overcome the problem of over-fitting the training set.
5. Example: Saudi Arabian oil production
In this example, we split the data into training and test sets using the window function. We keep the last 10 years of data for testing, and compute naive forecasts.
Because there are no parameters associated with a naive forecast, there actually isn't much point using a test set in this example. But the principle is important, and we will use this approach when we have more complicated forecasting models.
6. Forecast errors
Forecast errors are the differences between the test set observations and the point forecasts.
This is different from residuals in two ways. First, residuals are on the training set, while forecast errors are on the test set. Second, residuals are based on one-step forecasts, while forecast errors can be from any forecast horizon.
We compute the accuracy of our method using the forecast errors calculated on the test data.
7. Measures of forecast accuracy
There are a number of ways to measure forecast accuracy. We can take the average absolute error, the average squared error or the average percentage error. All of these are widely used and their equations are shown here.
But there are problems with these simple measures. If we want to compare forecast accuracy between two series on very different scales, we can't compare the MAE or MSE because their size depends on the scale of the data.
MAPE is better for comparisons, but only if our data are all positive and have no zeros or small values. MAPE also assumes there is a natural zero, so it can't be used with temperature forecasts, for example, as the Fahrenheit and Celsius scales have arbitrary zero points.
A solution is the mean absolute scaled error or MASE, which is like the MAE but is scaled so that it can be compared across series.
In all cases, a small value indicates a better forecast.
8. The accuracy() command
Once again, R makes our life easy by providing a function that does most of the work for us. The accuracy command computes all of these measures, plus a few others that we won't discuss here.
The training set measures are based on the residuals, while the test set measures are based on the forecast errors. In most cases, we are interested in the test set error measures.
On their own, these don't tell us much. But when we compare different forecast methods on the same data, these will be very useful in telling us what works and what doesn't.
9. Let's practice!
Now it's time for you to practice using the accuracy function.