Pipeline architecture

1. Pipeline architecture

Welcome back!

2. Model deployment

In the previous chapters, we reviewed the data source and the model selection process using the experimentation framework.

3. Model deployment

In this chapter, we will focus on model deployment by setting up the ETL and ML pipelines.

4. Pipelines requirements

Let's start by defining the pipeline's requirements. We will refresh the data and forecast daily and set the forecast horizon to 72 hours. We want the process to be robust, which means adding unit testing and validation steps to avoid data integrity issues, forecast performance drift, and other potential errors. Lastly, we should be mindful of the future maintenance of the pipeline and avoid unnecessary complexity.

5. Pipeline design

Once we have defined the pipeline requirements, the next step is to create the pipeline design. This design should represent the pipeline's directed acyclic graph blueprint, or DAG. Recall that DAGs help us define the execution order of the pipeline's tasks. Regardless of the tool you use to orchestrate your pipeline, having a clear end-to-end design simplifies the implementation process. Throughout this course, we will use the following pipeline design, which includes

6. Pipeline design

data ingestion process,

7. Pipeline design

Forecasting automation component

8. Pipeline design

data storage, and

9. Pipeline design

capturing logs to monitor the pipeline health, once deployed into production.

10. Pipeline design

To set up this pipeline, we will use `Airflow` to orchestrate the different components of the pipeline and `mlflow` to store the winning model from the experimentation process. Let's start by registering the winning model.

11. Model registry

There are two common approaches for model registration with `mlflow`: - Register the model during the logging of the model's parameters and metrics, or - Register the model separately, after the experimentation process is complete Each method has pros and cons. For simplicity, we will only log the winning model. We can log the model using `mlflow` built-in functionality, also known as flavor, which supports Python core machine learning classes. And for non-supported classes, we can build a custom register function. In both cases, we log the fitted object, which must have a predict method. To register the winning model, we will use the `mlforecast` library's customized register function.

12. Model registry

Let's start by importing the models, `mlflow`, and the `mlforecast` flavor module. We then set the experiment name and path, and pull the experiment metadata using the `get_experiment_by_name` function.

13. Model registry

We will define the `lightGBM` model and the `mlforecast` parameters.

14. Model registry

Next, we will set the `mlforecast` object and fit it to the time series. Once we have the fitted object, we can register it.

15. Model registry

Last but not least, we will set the run name using the model label and the time of the run, and log the model using the `mlforecast` flavor `log_model` function.

16. Model registry

Once the model is registered, it will appear in the UI labeled with the run name that we defined.

17. Model registry

You can notice that the model name is available in the models column.

18. Model registry

The model object is stored as an artifact, including the object pickle file, environment settings, and other metadata. In the following video, we will dive into the ETL process.

19. Let's practice!

Now it's your turn. Let's move to some exercises!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.