Get startedGet started for free

When time matters - a bit

You have learned how the acceptable latency of your Machine Learning service will impact the choice of the serving mode you will implement.

Sometimes users can wait for days, even weeks. Sometimes, a second is too much.

The lower the expected latency, the bigger the engineering challenges and the cost of your service becomes. Therefore, avoid over-engineering and match the design of your ML service to what the users require and are willing to pay for.

For example, say you are building an ML service for analyzing and summarizing large .pdf documents. If your users tell you that they would like to receive the outputs of your service within 5 minutes of making a request to it, the most reasonable serving mode for your use case would be:

This exercise is part of the course

MLOps Deployment and Life Cycling

View Course

Hands-on interactive exercise

Turn theory into action with one of our interactive exercises

Start Exercise