Get startedGet started for free

Automation in MLOps deployment strategies

1. Automation in MLOps deployment strategies

Welcome to this video about MLOps deployment strategies.

2. Model deployment - prediction service

We will focus on the prediction services component in our reference architecture.

3. Prediction services - modes recap

Our prediction service delivers machine learning predictions to end consumers with varying requirements. We can configure it to support batch predictions, streams, real-time, and on-the-edge predictions. The prediction service should be backed by a fully automated MLOps architecture. With this, we ensure scalability, reliability, and real-time performance to handle the demands of different consumers.

4. Prediction service - batch serving

In batch serving, we usually make many predictions in one go, typically on periodic schedules.

5. Prediction service - streaming serving

Another type of prediction service is stream predictions, where the prediction service continuously processes incoming data records and generates back ML predictions.

6. Prediction service - real time

In the real-time case, the prediction service processes a single record and returns the prediction instantly.

7. Prediction service - on the edge

Finally, ML prediction services can run directly on edge devices, such as mobile or IoT devices, and make predictions locally, reducing latency and communication costs.

8. Deployment strategies

In MLOps, a ML model's serving type determines how we should deploy and update our prediction services. Different serving types have varying requirements for data processing, resource allocation, and response time. We, therefore, need to design the deployment strategy accordingly. Deployment strategies include shadow deployment, canary deployment, A/B testing, and blue/green deployment. Let us go through some the different deployment strategies.

9. A/B testing

In A/B testing, we have two models, A and B, running in parallel. The prediction requests from the clients are directed to a load balancer responsible for distributing incoming requests between A and B. The performance of each model is continuously monitored, and the load balancer uses this information to adjust the distribution of requests between the two models. The goal is to ensure that the model with the best performance handles most of the prediction requests while the other model is used for validation purposes.

10. A/B testing

By automatically shifting requests to the better-performing model over time, an A/B testing deployment strategy allows the deployment to automatically adjust and use the best-performing model, ensuring the highest quality of predictions.

11. Shadow deployment

In a shadow deployment, a new model runs parallel to the production model. Requests are split between them, but only the live model delivers predictions to the client. Outputs from both are compared and monitored for performance differences. If the shadow model performs better, it can replace the live model as the new production model.

12. Blue/Green deployment

In blue/green deployments, we start with a model running in prod in a "blue" environment.

13. Blue/Green deployment

We wish to deploy an updated model. We do this via a replica of the blue environment. This is the green environment. The prediction requests gradually, after our specifications, switch traffic from the blue to the green environment.

14. Blue/Green deployment

After a while, if the system runs without issues, most traffic will have switched to the green environment.

15. Blue/Green deployment

Until finally, all traffic can be handled by this newly updated system.

16. Deploying and updating prediction services

When selecting a deployment strategy in MLOps, important dimensions to consider include downtime during deployment, condition-based deployment, rollback time in case of failure, and additional deployment costs. These factors can be used to evaluate and compare various deployment strategies such as shadow, canary, A/B testing, and blue/green. By carefully weighing each of these dimensions, organizations can select the optimal deployment strategy for their needs, ensuring a smooth and efficient deployment process with minimal risk and downtime. In summary, the deployment strategy must be chosen based on the serving type to ensure the efficient and effective delivery of ML predictions.

17. Let's practice!

Great work completing this video! Let's practice!