1. Productionizing your forecast model
Hi, my name is Rami Krispin.
I will be the instructor for this course.
2. Introduction
Before we jump into the course material, let me tell you a bit about myself:
I am a senior data science and engineering manager, and I have a decade of experience working with time series data and building forecasting models at scale and MLOps. I am the author of the book Heads-on Time Series Analysis and Forecasting with R, and the creator and maintainer of several open source projects.
3. Forecasting in Production
This course focuses on forecasting in production. We will learn how to design a forecasting pipeline to automate and monitor a recurring forecasting task.
4. Motivation
We typically would like to productionize a forecasting task when the task is either
automation - reoccurring regularly and frequently, such as forecasting the hourly temperature on an hourly basis; or
large scale - when having large amounts of series and the forecasting process requires high computing resources to support the process.
And, of course, a combination of both - automating a forecasting process at scale.
Let's review the general architecture of a forecasting pipeline.
5. General architecture
It typically includes the following components:
6. General architecture
A live data source, such as an API endpoint or a database. This requires a data pipeline to automate the ETL process and ensure that our local dataset is up-to-date with the data source.
7. General architecture
An experimentation framework to train, test, and evaluate the forecasting models' performance. This component is used to identify the best forecasting model.
8. General architecture
Once we identify the best model, we will deploy it in the production environment, which includes the automation and scaling layers.
9. General architecture
Last but not least is the post-deployment step, which includes monitoring the model's performance in production to identify performance drift and other potential issues.
10. General architecture
Throughout this course, we will dive into the different components of this architecture. We will use a real-life example to demonstrate this process using the US hourly demand for electricity from the EIA API.
11. Course outline
In this chapter, we will review the data source and demonstrate how to pull the data from the EIA API.
Chapter 2 focuses on the experimentation process, covering how to train, test, and log the performance of multiple forecasting models in order to identify the best forecasting approach.
Chapter 3 reviews the deployment process, including data automation, model refresh, and capturing logs. We will demonstrate how to set automation using AirFlow.
Chapter 4 focuses on post-deployment steps, which include monitoring the pipeline and setting alerts. Last but not least, we will conclude the course with best practices.
12. Course prerequisites
The level of this course is advanced, and to complete this course successfully and be able to apply the course learning you will need to have prior knowledge of the following:
Time series analysis and forecasting.
Orchestration systems such as AirFlow,GitHub Actions, etc
Query data from APIs, and
Python programming
13. Course tools
Here are some of the tools we will use in the course:
- Nixtla's statsforecast and mlforecast libraries to create forecast
- MLflow to track and log the model's performance in the experiment, and
- Quarto dashboard to monitor the pipeline
This course mainly focuses on the principles. Therefore, there is no limitation to apply the learning with other tools or programming languages such as R or Julia.
14. Let's practice!
Let's get started!