1. Working with a forecast object
Now let's go through the forecasting workflow!
2. Work with a forecast object
We will perform data preparation,
train multiple forecasting models
and evaluate their performance.
To do this, we will use the statsforecast library. Then, in the exercises, you will then work with mlforecast, which has the same workflow.
3. Training approach
We will use a simple train-and-test approach, leaving the last 72 hours as a testing partition and training the models with the rest of the data. In the next chapter, we will explore a more robust training approach.
4. Required libraries
Let's get started by loading the required libraries. We will use the pandas and datetime to load and process the time series data.
5. Required libraries
We will import the StatsForecast class, the DynamicOptimizedTheta, SeasonalNaive, AutoARIMA, HoltWinters, and MSTL models, and the plot_series function.
6. The statsforecast data format
As a reminder, to use the statsforecast library, the data must have the following three columns: unique_id, representing the series id,
ds - the series timestamp,
and y, the series values.
Let's modify the data to follow this format.
7. Data preparation - data load
We load our data, and reformatted it to match the statsforecast requirements. In addition, we set an environment variable that specificity the unique id column number.
8. Data preparation - train/test split
Next, we will split the series into training and testing partitions by using datetime's timedelta function to subset by period, leaving the last 72 hours as testing.
9. Data preparation
Before pivoting to the modeling, let's use the plot_series function to plot the training and testing partitions.
As we saw in the previous lessons, the series has strong seasonality patterns.
10. Forecasting with StatsModels
Let's start to build our models. We will use the following five forecasting models:
AutoArima, Seasonal Naive and Theta models with hourly seasonality component.
And two Multi-seasonal trend models to handle both hourly and day of the week seasonality, using different forecasting methods.
11. Forecasting with StatsModels
Next, we store those models in a list, and use the StatsForecast function to define the model object.
We use the freq argument to define the series frequency as hourly. The fallback argument specifies which model to use in case of a model failure. Lastly, we set the n_jobs argument to -1 to use all available cores during the model execution.
12. Forecasting with StatsModels
To create the forecast, we call the object's forecast method. We use the training partition as input, set the horizon to 72 hours, and define the prediction interval level to 95%.
13. Forecast Forecasting with StatsModels StatsModels
Let's use again the plot_series function to plot the forecast output with the testing partition.
14. Model evaluation
The last step is to evaluate the model performance. We will use these functions to calculate MAPE, RMSE and the prediction interval coverage.
15. Model evaluation
We loop over the forecast object, calculate metrics for the testing partition, and append them to a pandas DataFrame.
16. Model evaluation
Sorting performance by RMSE and printing the results, we see that the Seasonal Naive model achieved the lowest RMSE and MAPE score on the testing partition.
17. Model evaluation
When examining the models' performance, the first question we should ask is whether there is room for improvement. Could using different settings yield a more accurate forecast?
Does the performance on the testing set reflect the future performance of the models or just a one-time fit? This is where the use of experiment comes in handy, and this is the focus of the next chapter.
18. Let's practice!
Now it's your turn to build and evaluate forecasting models!