Is your deployed AI system successful?

1. Is your deployed AI system successful?

How can we assess the success of an AI solution before and after its deployment? Does our solution contribute to achieving our intended business goal, or does it report a positive return on investments?

2. When to measure success?

The success of an AI initiative or product must be assessed and closely monitored.

3. When to measure success?

Not only during its development, but also after its deployment in production, in parallel with a continuous performance monitoring process.

4. Measuring performance offline - accuracy

During development, particularly for Machine Learning or Deep Learning models, we want to use some metrics that help determine how well they perform before releasing them into production. Take a classification model as an example, where accuracy is normally the central metric to look at. If we have a dataset of pre-labeled penguin observations from three different species,

5. Measuring performance offline - accuracy

we only use a portion of the labeled data to train our model, in this case, a classifier that tries to learn from the data features to distinguish between penguin species.

6. Measuring performance offline - accuracy

We then validate our model by taking the remaining examples we left aside, passing them without their labels to our trained model which tries to predict the right penguin species for as many validation examples as possible.

7. Measuring performance offline - accuracy

Some predictions will be correct, some will not.

8. Measuring performance offline - accuracy

In a nutshell, this is the essence of accuracy metrics: measuring the performance of an ML (or in general an AI) solution against new data, based on the number of times it gives the right output.

9. Beyond accuracy - error and other metrics

Depending on the problem and type of solution, there are other important metrics to monitor. In regression models, for instance, performance is described by the error between numerical predictions and actual outputs. In search and recommendation engines, the ordered ranking of results must be assessed in terms of user relevance or diversity, and so on. If our model performance is not as expected, we may have to improve it by fine-tuning it or by improving the quality of the training data, until it performs satisfactorily.

10. Measuring success in production

But the story doesn't end there. We need to continue observing its performance, for both performance metrics discussed earlier but also contribution to business goals. A deployed model's performance must be closely monitored for multiple reasons. One of the most obvious is model degradation, which happens when the assessed metric starts to deteriorate over time, for example, because the nature of the data consumed changes, signaling the need to re-train our model. The concept of KPI (or Key Performance Indicator) is commonly used to quantify the business success of an AI system. A KPI is a measurable indicator of the performance and progress of specific objectives in an organization.

11. Risks: what could possibly go wrong?

Finally, it is realistic to assume that many types of risks can get in the way during our journey to a successful AI solution. Here are some examples of risks, some of which we will discuss later: Data bias leading to discriminatory outcomes. Lack of transparency to understand AI decisions. Ethical concerns like responsible data use. System reliability and robustness against errors. And possible vulnerabilities to cyber threats. One of the ways to identify risks is by developing a Proof-of-Concept before the final AI product or solution. A PoC is a pilot version of the solution to demonstrate its feasibility and potential value.

12. Let's practice!

Time to practice.