When time matters - a bit
You have learned how the acceptable latency of your Machine Learning service will impact the choice of the serving mode you will implement.
Sometimes users can wait for days, even weeks. Sometimes, a second is too much.
The lower the expected latency, the bigger the engineering challenges and the cost of your service becomes. Therefore, avoid over-engineering and match the design of your ML service to what the users require and are willing to pay for.
For example, say you are building an ML service for analyzing and summarizing large .pdf documents. If your users tell you that they would like to receive the outputs of your service within 5 minutes of making a request to it, the most reasonable serving mode for your use case would be:
This exercise is part of the course
MLOps Deployment and Life Cycling
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
