Time series cross-validation

1. Time series cross-validation

Up to now, we have used this traditional evaluation set up.

2. Time series cross-validation

The problem is that it is wasteful. We have a relatively small test set, and it is possible that we could draw conclusions that work for that test set but which are not reliable for future times. Time series cross-validation is a solution to this problem.

3. Time series cross-validation

In time series cross-validation, we have a series of training and test sets. Assuming we are interested in one-step forecasts, the set up would look like this. For each row, the blue dots show the training set, and the red dot the test set. The white dots are not used. Each training set consists of just one more observation than the previous training set. In this way, many more observations can be used in the test set, and we can evaluate how good the method is for one-step forecasting by averaging the error over all those tiny test sets. This is known in econometrics as "forecast evaluation on a rolling origin". (For some reason, econometricians never seem to come up with memorable names!) The forecast origin is the time at the end of the training data. And it rolls forward in time. Hence the name. But I like to call it "time series cross-validation", because it is analogous to cross-validation for non-time-series problems. We can do something similar for multi-step forecasting,

4. Time series cross-validation

such as 2-steps ahead,

5. Time series cross-validation

and 3-steps ahead. It looks like a lot of work to do, but there is a neat R function that does everything for you.

6. tsCV function

tsCV applies a forecasting method on a sequence of training sets computed from a time series. It works like this. The resulting forecast errors are saved in the object e. There may be some missing values, especially at the start of the series, because it is simply not possible to compute the forecast when the training error is too small. You need to compute your own error measures when you use this function. I've computed the mean squared error for you here as an example. This is actually a strange example, because the naive method has no parameters to estimate. Therefore, tsCV will produce the same values as the residuals function in this case.

7. tsCV function

Here is a more sophisticated use of the function. I am computing the mean squared error for different forecast horizons based on time series cross-validation. The pipe operator comes in handy here to simplify my code. However, I did need to write my own square function because the usual approach won't work inside a sequence of pipes. Notice how the MSE increases with the forecast horizon. The further ahead we forecast, the less accurate our forecasts are.

8. tsCV function

In summary, time series cross-validation is very useful for selecting a good forecasting model. In general, a good rule is to choose the model with the smallest MSE. If a particular forecast horizon is of interest, then compute the cross-validated MSE at that horizon. That way, you are choosing the best forecast model for your purpose.

9. Let's practice!

Now, let's use the tsCV function on a different data set in the next exercise.

This exercise is part of the course

Forecasting in R

IntermediateSkill Level

4.8+

Start Course for Free

The first thing to do in any data analysis task is to plot the data. Graphs enable many features of the data to be visualized, including patterns, unusual observations, and changes over time. The features that are seen in plots of the data must then be incorporated, as far as possible, into the forecasting methods to be used.

Exercise 1: Welcome to Forecasting Using R Exercise 2: Creating time series objects in R Exercise 3: Time series plots Exercise 4: Seasonal plots Exercise 5: Trends, seasonality, and cyclicity Exercise 6: Autocorrelation of non-seasonal time series Exercise 7: Autocorrelation of seasonal and cyclic time series Exercise 8: Match the ACF to the time series Exercise 9: White noise Exercise 10: Stock prices and white noise

In this chapter, you will learn general tools that are useful for many different forecasting situations. It will describe some methods for benchmark forecasting, methods for checking whether a forecasting method has adequately utilized the available information, and methods for measuring forecast accuracy. Each of the tools discussed in this chapter will be used repeatedly in subsequent chapters as you develop and explore a range of forecasting methods.

Exercise 1: Forecasts and potential futures Exercise 2: Naive forecasting methods Exercise 3: Fitted values and residuals Exercise 4: Checking time series residuals Exercise 5: Training and test sets Exercise 6: Evaluating forecast accuracy of non-seasonal methods Exercise 7: Evaluating forecast accuracy of seasonal methods Exercise 8: Do I have a good forecasting model?Exercise 9: Time series cross-validation

Current Exercise

Exercise 10: Using tsCV() for time series cross-validation

Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation, the higher the associated weight. This framework generates reliable forecasts quickly and for a wide range of time series, which is a great advantage and of major importance to applications in business.

Exercise 1: Exponentially weighted forecasts Exercise 2: Simple exponential smoothing Exercise 3: SES vs naive Exercise 4: Exponential smoothing methods with trend Exercise 5: Holt's trend methods Exercise 6: Exponential smoothing methods with trend and seasonality Exercise 7: Holt-Winters with monthly data Exercise 8: Holt-Winters method with daily data Exercise 9: State space models for exponential smoothing Exercise 10: Automatic forecasting with exponential smoothing Exercise 11: ETS vs seasonal naive Exercise 12: Match the models to the time series Exercise 13: When does ETS fail?

ARIMA models provide another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.

Exercise 1: Transformations for variance stabilization Exercise 2: Box-Cox transformations for time series Exercise 3: Non-seasonal differencing for stationarity Exercise 4: Seasonal differencing for stationarity Exercise 5: ARIMA models Exercise 6: Automatic ARIMA models for non-seasonal time series Exercise 7: Forecasting with ARIMA models Exercise 8: Comparing auto.arima() and ets() on non-seasonal data Exercise 9: Seasonal ARIMA models Exercise 10: Automatic ARIMA models for seasonal time series Exercise 11: Exploring auto.arima() options Exercise 12: Comparing auto.arima() and ets() on seasonal data

The time series models in the previous chapters work well for many time series, but they are often not good for weekly or hourly data, and they do not allow for the inclusion of other information such as the effects of holidays, competitor activity, changes in the law, etc. In this chapter, you will look at some methods that handle more complicated seasonality, and you consider how to extend ARIMA models in order to allow other information to be included in the them.

Exercise 1: Dynamic regression Exercise 2: Forecasting sales allowing for advertising expenditure Exercise 3: Forecasting electricity demand Exercise 4: Dynamic harmonic regression Exercise 5: Forecasting weekly data Exercise 6: Harmonic regression for multiple seasonality Exercise 7: Forecasting call bookings Exercise 8: TBATS models Exercise 9: TBATS models for electricity demand Exercise 10: Your future in forecasting!