Get startedGet started for free

Monitoring ML services

1. Monitoring ML services

Welcome to the last chapter of this course, which is dedicated to monitoring and maintaining our models. Our ML service is now up and running, delivering thousands of predictions each minute. We've even got several paying customers.

2. Maintaining quality

But when users start paying for our service, they also expect us to guarantee a certain level of quality. Quality assurance starts with quality control, that is, in this case, monitoring.

3. Performance indicators

First of all, we must keep track of the fundamental health indicators of our service, such as: is the service up and running? How many requests are we receiving and successfully handling? Is the latency acceptable? But the ultimate quality we need to deliver, is that of our predictions. You may have heard that "machine learning models deteriorate over time", but what does that actually mean?

4. Predictive performance

Let's illustrate it with an example. Say we're building a classifier that uses two input features to separate certain objects into two classes, as depicted on the diagram.

5. Learned boundary

During training, this classifier learns the decision boundary between these two classes.

6. From then on

From then on, every new object that falls to the left of it will be assigned class one, and the ones on the right will be assigned class two.

7. But the world is changing

But, as the real world is constantly changing

8. Change 2

the properties of these classes can soon change

9. Change 3

so that the actual boundary moves elsewhere in this feature space.

10. Concept drift

We call this "concept drift" and define it as any "significant change in the actual relationship between the input and output features". As our model keeps making predictions based on the old boundary, it will make more and more incorrect predictions.

11. Degrading by not changing

So, interestingly, the model deteriorates by staying the same while the reality it should represent is changing.

12. Detecting concept drift

We need to detect concept drift as soon as possible, but how do we do it?

13. Compare against ground truth

In theory, it's as simple as checking if our model's predictions match the ground truth as often as expected. But in practice, things are never that simple.

14. Verification latency

The main catch is that the ground truth is only available after a certain amount of time after the prediction. We call this delay the verification latency.

15. Verification by use case

In cases such as stock prediction, the ground truth may be available in less than a second. In financial fraud detection, we're talking about months. In other areas, we might never see the ground truth: if a bank denied somebody a loan based on the model's prediction, we would NEVER know if that decision was, in fact, wrong.

16. Never free

But even when it's available, obtaining it can be a slow and expensive process. So what other options do we have?

17. Input feature monitoring

We can work with what we have and monitor the input features.

18. Input monitoring 2

If we notice that the relationships between the inputs have changed since the model was trained

19. Covariate shift

we call this the "covariate shift", because features are also called "covariates" in statistical jargon. In practice, covariate shift is a good indicator of concept drift but it is not perfect.

20. Limitations of input monitoring

Covariate shift can happen without any concept drift And we can have concept drift without any covariate shift. And, of course, we can have both phenomena simultaneously.

21. Output monitoring

Input monitoring should, for those reasons, be supplemented with output monitoring. Again, nothing is better than having the ground truth. Still, in its absence, changes in the distribution of model outputs are a solid indicator of reality changing and our model requiring maintenance.

22. Let's practice!

These are some essential concepts, so let's take a moment to practice them!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.