Evaluating forecast accuracy of non-seasonal methods

In data science, a training set is a data set that is used to discover possible relationships. A test set is a data set that is used to verify the strength of these potential relationships. When you separate a data set into these parts, you generally allocate more of the data for training, and less for testing.

One function that can be used to create training and test sets is subset(), which returns a subset of a time series where the optional start and end arguments are specified using index values.

> # x is a numerical vector or time series
> # To subset observations from 101 to 500
> train <- subset(x, start = 101, end = 500, ...)

> # To subset the first 500 observations
> train <- subset(x, end = 500, ...)

As you saw in the video, another function, accuracy(), computes various forecast accuracy statistics given the forecasts and the corresponding actual observations. It is smart enough to find the relevant observations if you give it more than the ones you are forecasting.

> # f is an object of class "forecast"
> # x is a numerical vector or time series
> accuracy(f, x, ...)

The accuracy measures provided include root mean squared error (RMSE) which is the square root of the mean squared error (MSE). Minimizing RMSE, which corresponds with increasing accuracy, is the same as minimizing MSE.

The pre-loaded time series gold comprises daily gold prices for 1108 days. Here, you'll use the first 1000 days as a training set, and compute forecasts for the remaining 108 days. These will be compared to the actual values for these days using the simple forcasting functions naive(), which you used earlier in this chapter, and meanf(), which gives forecasts equal to the mean of all observations. You'll have to specify the keyword h (which specifies the number of values you want to forecast) for both.

This exercise is part of the course

Forecasting in R

View Course

Exercise instructions

Use subset() to create a training set for gold comprising the first 1000 observations. This will be called train.
Compute forecasts of the test set, containing the remaining data, using naive() and assign this to naive_fc. Set h accordingly.
Now, compute forecasts of the same test set using meanf() and assign this to mean_fc. Set h accordingly.
Compare the forecast accuracy statistics of the two methods using the accuracy() function.
Based on the above results, store the forecasts with the higher accuracy as bestforecasts.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create the training data as train
train <- subset(___, end = ___)

# Compute naive forecasts and save to naive_fc
naive_fc <- naive(___, h = ___)

# Compute mean forecasts and save to mean_fc
mean_fc <- meanf(___, h = ___)

# Use accuracy() to compute RMSE statistics
accuracy(___, gold)
___(___, gold)

# Assign one of the two forecasts as bestforecasts
bestforecasts <- ___

Edit and Run Code