Get startedGet started for free

Monitoring, re-training and replacing MLOps applications

1. Monitoring, re-training and replacing MLOps applications

Now that we have trained, evaluated, and tested our model, we can actually deploy it so that we can generate business value through the machine learning application. We are in the operations phase.

2. Operating machine learning models

As mentioned in the previous chapter, the lack of an operations mindset often limits the impact of data science and machine learning. MLOps is all about putting models in production in the real world and maintaining them there. We will focus on four stages, from left to right: pre-deployment, deployment, monitoring, and re-training.

3. Pre-deployment

The pre-deployment phase consists of rather technical tasks that are, however, necessary for successful operations.

4. Pre-deployment

Here we must ensure that the models will produce the same results independently of whether we are on the development system or, for example, in the cloud. Common errors are due to different time zones or because of different library versions. For this purpose, we usually containerize our package. Imagine this like a shipping container from one harbor to another. Pre-deployment also contains some other steps regarding security, for example. Most tasks here are usually the job of the software engineer, sometimes also of the machine learning engineer.

5. Deploying the MLOps application

Finally, we can deploy the model.

6. Deploying the MLOps application

We usually do this by a process called CI/CD, which stands for continuous integration and continuous delivery, to deploy faster and more frequently. If a model is already operating, different strategies exist to replace an existing model without disturbing the users. We will touch on them later. The important thing here is a lot can go wrong, so this step includes different automated tests again. We also want to be sure that we can easily restore previous models. Again, this is primarily the task of the software engineer or sometimes of the machine learning engineer.

7. Monitoring the MLOps application

Once deployed, our MLOps application can actually impact our business, and we can generate returns on our ongoing investments in MLOps. A recommender system, for example, can reduce churn by enhancing user satisfaction, or a supply chain application can reduce production stops or costs by better raw material inventory planning.

8. Monitoring the MLOps application

At the same time, everything that goes wrong now can have costly consequences. So it is of utmost importance to closely monitor the application and take immediate action if something goes wrong in a, let's say, customer-facing application. The monitoring will, of course, happen largely automatically. The project team is required here. The data engineer is responsible for implementing tests for the data pipeline, the data scientist for the model prediction, and the software engineer for the embedding in larger applications. If something goes wrong, the team often needs to be brought in to decide how to fix the issue. What we usually witness is a deterioration of the model quality over time. The economic situation or the customers' preferences change, which will impact the quality of the model's predictions. Often, the underlying issue is that the data used for training is not sufficiently representative of real-world data anymore.

9. Re-training the machine learning models

This requires us to re-train the machine learning models based on new and updated input data regularly as well as on demand. The reason for that is our business environment changes, and accordingly, also our input data or the model itself, or our business requirements.

10. Re-training the machine learning models

Re-training can, to some extent, be automated. For example, a regular re-training every week or defining specific criteria when to re-train the models based on the monitored predictive performance. However, not always, if we, for example, move from a lasting low-inflation regime to high-inflation. It might require us to look for different features and new models or even go back to the design phase to adjust the business requirements. This is why MLOps is not a one-way street but rather a turnaround. Re-training is as model development mainly the task of the data scientist or machine learning engineer.

11. Let's practice

Now let's foster our understanding by practicing!