Backtesting

1. Backtesting

Now, we will review the backtesting method for training and testing forecasting models and its implementation in Python.

2. Backtesting

Backtesting or time series cross-validation is the time series equivalent modeling training approach to cross-validation in machine learning. This method uses a window function to split the series into multiple training and testing partitions.

3. Backtesting

We then train the forecasting models using the training partitions

4. Backtesting

and evaluate their performance using the testing partitions. The main difference between cross-validation and backtesting is that the partitions in backtesting must be sequential to avoid information leakage. There are mainly two types of window functions:

5. Backtesting

First, expanding window, where we increment the number of observations of the training partition as we move the window to the right.

6. Backtesting

And sliding window, where we keep the number of observations constant.

7. Backtesting

In both cases, the size of the testing partitions remains the same.

8. Backtesting

The advantage of using multiple testing partitions instead of a single partition is that it indicates the model's robustness. On the other hand, it comes with higher computing costs. A model that achieves a lower error rate on multiple partitions is, in some cases, a potential indication of how the model will perform on the actual forecast. Let's demo the process using the `cross_validation` function from the `mlforecast` library to train the forecasting models with backtesting.

9. Backtesting settings

We will train the following five models, and set a backtesting process with

10. Backtesting settings

10 training and testing partitions

11. Backtesting settings

using expanding window

12. Backtesting settings

where each testing partition length is 72 hours

13. Backtesting settings

spacing each partition by 24 hours.

14. Backtesting settings

Lastly, we will model the forecast uncertainty using conformal prediction intervals with a 95% level of significance. Conformal prediction intervals, can handle both nonparametric models such as XGBoost and parametric models such as linear regression, unlike traditional prediction intervals which is limited to parametric models.

15. Required libraries

We will start by importing the `mlforecast` modules, specifically `MLForecast`, `Differences`, and `PredictionIntervals`, along with `expanding_mean` from `window_ops.expanding` and our required models.

16. Required libraries

We will also import supporting libraries to process the data.

17. Data preparation

Next, we will load the series from a CSV file and reformat it. We will use the last two years of data to train the models, and set the series unique id as environment variable.

18. Define the forecasting models

We will define the following models: `lightGBM`, `XGboost`, linear regression, ridge, and lasso regressions, and as before, assign them to the model object. We will regress the series against the last 24 lags and use seasonal features.

19. Set the backtesting parameters

We will define the backtesting parameters, setting 10 partitions with a window shift of 24 hours, and a testing partition of 72 hours. We will also define the conformal intervals using the `PredictionIntervals` function.

20. Training models with backtesting

Lastly, we will execute the backtesting process using the `cross_validation` function, providing our backtesting parameters as arguments.

21. Training models with backtesting

The function returns a DataFrame with the backtesting results.

22. Backtesting results in wide format

It is in a wide format, with 19 columns. We will reorganize this soon to easily score model performance on the testing partitions,

23. Let's practice!

but for now let's practice backtesting.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.