Evaluating forecast accuracy of non-seasonal methods
In data science, a training set is a data set that is used to discover possible relationships. A test set is a data set that is used to verify the strength of these potential relationships. When you separate a data set into these parts, you generally allocate more of the data for training, and less for testing.
One function that can be used to create training and test sets is subset()
, which returns a subset of a time series where the optional start
and end
arguments are specified using index values.
> # x is a numerical vector or time series
> # To subset observations from 101 to 500
> train <- subset(x, start = 101, end = 500, ...)
> # To subset the first 500 observations
> train <- subset(x, end = 500, ...)
As you saw in the video, another function, accuracy()
, computes various forecast accuracy statistics given the forecasts and the corresponding actual observations. It is smart enough to find the relevant observations if you give it more than the ones you are forecasting.
> # f is an object of class "forecast"
> # x is a numerical vector or time series
> accuracy(f, x, ...)
The accuracy measures provided include root mean squared error (RMSE) which is the square root of the mean squared error (MSE). Minimizing RMSE, which corresponds with increasing accuracy, is the same as minimizing MSE.
The pre-loaded time series gold
comprises daily gold prices for 1108 days. Here, you'll use the first 1000 days as a training set, and compute forecasts for the remaining 108 days. These will be compared to the actual values for these days using the simple forcasting functions naive()
, which you used earlier in this chapter, and meanf()
, which gives forecasts equal to the mean of all observations. You'll have to specify the keyword h
(which specifies the number of values you want to forecast) for both.
This exercise is part of the course
Forecasting in R
Exercise instructions
- Use
subset()
to create a training set forgold
comprising the first 1000 observations. This will be calledtrain
. - Compute forecasts of the test set, containing the remaining data, using
naive()
and assign this tonaive_fc
. Seth
accordingly. - Now, compute forecasts of the same test set using
meanf()
and assign this tomean_fc
. Seth
accordingly. - Compare the forecast accuracy statistics of the two methods using the
accuracy()
function. - Based on the above results, store the forecasts with the higher accuracy as
bestforecasts
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create the training data as train
train <- subset(___, end = ___)
# Compute naive forecasts and save to naive_fc
naive_fc <- naive(___, h = ___)
# Compute mean forecasts and save to mean_fc
mean_fc <- meanf(___, h = ___)
# Use accuracy() to compute RMSE statistics
accuracy(___, gold)
___(___, gold)
# Assign one of the two forecasts as bestforecasts
bestforecasts <- ___