Backtesting
1. Backtesting
Now, we will review the backtesting method for training and testing forecasting models and its implementation in Python.2. Backtesting
Backtesting or time series cross-validation is the time series equivalent modeling training approach to cross-validation in machine learning. This method uses a window function to split the series into multiple training and testing partitions.3. Backtesting
We then train the forecasting models using the training partitions4. Backtesting
and evaluate their performance using the testing partitions. The main difference between cross-validation and backtesting is that the partitions in backtesting must be sequential to avoid information leakage. There are mainly two types of window functions:5. Backtesting
First, expanding window, where we increment the number of observations of the training partition as we move the window to the right.6. Backtesting
And sliding window, where we keep the number of observations constant.7. Backtesting
In both cases, the size of the testing partitions remains the same.8. Backtesting
The advantage of using multiple testing partitions instead of a single partition is that it indicates the model's robustness. On the other hand, it comes with higher computing costs. A model that achieves a lower error rate on multiple partitions is, in some cases, a potential indication of how the model will perform on the actual forecast. Let's demo the process using the `cross_validation` function from the `mlforecast` library to train the forecasting models with backtesting.9. Backtesting settings
We will train the following five models, and set a backtesting process with10. Backtesting settings
10 training and testing partitions11. Backtesting settings
using expanding window12. Backtesting settings
where each testing partition length is 72 hours13. Backtesting settings
spacing each partition by 24 hours.14. Backtesting settings
Lastly, we will model the forecast uncertainty using conformal prediction intervals with a 95% level of significance. Conformal prediction intervals, can handle both nonparametric models such as XGBoost and parametric models such as linear regression, unlike traditional prediction intervals which is limited to parametric models.15. Required libraries
We will start by importing the `mlforecast` modules, specifically `MLForecast`, `Differences`, and `PredictionIntervals`, along with `expanding_mean` from `window_ops.expanding` and our required models.16. Required libraries
We will also import supporting libraries to process the data.17. Data preparation
Next, we will load the series from a CSV file and reformat it. We will use the last two years of data to train the models, and set the series unique id as environment variable.18. Define the forecasting models
We will define the following models: `lightGBM`, `XGboost`, linear regression, ridge, and lasso regressions, and as before, assign them to the model object. We will regress the series against the last 24 lags and use seasonal features.19. Set the backtesting parameters
We will define the backtesting parameters, setting 10 partitions with a window shift of 24 hours, and a testing partition of 72 hours. We will also define the conformal intervals using the `PredictionIntervals` function.20. Training models with backtesting
Lastly, we will execute the backtesting process using the `cross_validation` function, providing our backtesting parameters as arguments.21. Training models with backtesting
The function returns a DataFrame with the backtesting results.22. Backtesting results in wide format
It is in a wide format, with 19 columns. We will reorganize this soon to easily score model performance on the testing partitions,23. Let's practice!
but for now let's practice backtesting.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.