Get startedGet started for free

Model performance measurement

1. Model performance measurement

Great, in this video, we will review several model performance measurement metrics that you can use when assessing the quality of the machine learning predictions.

2. Performance measurement types

There are a number of model performance measures but we will focus on the ones measuring the fit of the supervised learning models. Accuracy metrics measure how well we predict the classes, like churn status, fraud status or other ones in classification. Error metrics give us insights about how far our predictions are from the real values in regression.

3. Classification performance

There are a number of ways to measure classification, but typically you will hear about 3 metrics - accuracy, precision and recall.

4. Churn example

Ok, let's check out a churn example in a chart.

5. Churn example

Here, you see 10 churned and non-churned customers as dots, and two features - number of complaints on the x axis, and purchases this year on the y axis.

6. Churn prediction

We can build a simple prediction which uses a straight line to predict if that customer has churned.

7. Mis-classified items

Here, we have in total two incorrectly classified observations. What if the cost of missing a churn customer is very high?

8. Another churn prediction

Then it's better to misclasify a non-churner as a churner, than the other way around. Actually, this option captures all churn customers and has only one incorrect prediction.

9. Accuracy

Let's pause for a bit to talk about the metrics. Accuracy is a measure of total correct predictions divided by all observations. In this case we misclassified only one observation out of 10, so our accuracy is 90%.

10. Precision

With precision we are only interested in churned customer prediction. Precision measures how many churn predictions are true - here we predicted 6 observations as churned while only 5 actually churned, so the precision is around 83%.

11. Recall

Recall measures how many of the total churned customers the model was able to capture. Our model identified all 5 churned customers which means recall is 100%. There's always tension between precision and recall, and the business has to identify which one is more important - misclassify churned customers as not being at risk of churning, or misclassify good customers are likely to churn?

12. Regression performance

In regression we are predicting a continuous variable - like sales, stock price, or purchases. The key metric is an error which calculates how far away the prediction is from the observed number.

13. Regression example

Let's use a chart again. Here we have a number of data points showing how much product revenue is affected by advertising.

14. Predicting revenue with a line

We can build a straight line and it does pretty well.

15. Regression error

Here are the prediction errors. We can sum them up and calculate the average error.

16. Testing non-linear models

Now, what if the error is too high? Then we can try again with a curve. It assumes diminishing marginal effects of advertising.

17. Error improvements

Looks like the errors are smaller with this option.

18. Actionable models - A/B testing

Now, there are many metrics and you need to pick the right one, but model performance is just one side of the coin. No matter how good a model, it can still not actionable - be it churn, purchase or machine failure prediction. We must always test them with real life experiments. For example, target customers predicted as churned with incentives and see if they churn less than the non targeted customers. Or try to remind people of a product once they have been predicted as likely to buy it. Do they buy it more than the ones without the reminder? If you get improvements using the ML outputs, then your model is valuable and you build it into automated production systems. If not, back to the drawing board!

19. Let's practice!

Great, let's test what we've learned on the model performance measurement!