Measuring performance

1. Measuring performance

In the previous video and exercise, you saw how to visually compare an observed anomaly score against a set of known labels. In this video, we'll learn how to use numerical summaries to measure the performance of anomaly scores.

2. Using a decision threshold

Earlier chapters describe algorithms that return continuous measures of how anomalous an observation is. In practice, it's useful to convert the continuous measure into a binary value. A common method is to split the scores around a high value so that only the small proportion of data examples that exceed the threshold are classed as anomalies. The quantile function is used to find percentiles of a distribution and is a convenient way to choose a threshold. The code example shows how quantile is used. The argument probs is a value between 0 and 1 determining the percentile to calculate. The returned value of high underscore score is 0.623 which is the 99th percentile of the distribution of the input sat dollar score. In a second step, the new column binary underscore score is created, that takes the value 0 if sat dollar score is less than high underscore score, and 1 otherwise.

3. Tables of agreement

The binarized score can be used to assess anomaly detection performance by comparing it with the true labels. The table function takes two vectors as input and counts the number of times each unique combination of values occurs across the two inputs. When the true label and binarized score are passed to the table function, as shown here, the resulting table summarizes the agreement between the score and the true label. The table's rows are indexed by values in the first argument, sat dollar label, while the columns are indexed by values in the second argument, binary underscore score. Reading from left to right on the bottom row, of the 71 items labeled 1, 15 were scored as non-anomalous and 56 were correctly detected. But how good is this? Next, we'll discuss further metrics based on this table that will help us understand the performance.

4. Recall

Recall is a proportion found by dividing the number of correctly identified anomalies by the total number of true anomalies. A perfect recall of 1 would indicate that every real anomaly is detected by the algorithm. The recall for the isolation forest score is calculated using two pieces of information from the table, the number of satellite images with label 1 and whose binary score is also 1, which from the table is 56. We also need the total number of anomalies, which is the sum of the bottom row of the table, 15 plus 56. The resulting recall is approximately 0 point 79.

5. Precision

Precision is found by dividing the number of correctly labeled items in the anomaly class by the total number of items labeled as anomalous by the anomaly score. Higher precision is better, and a perfect precision of 1 indicates that the detector does not incorrectly label any normal instances as anomalous. To derive the precision from the table output, two pieces of information are required. First, the number of satellite images with label equal to 1 whose binary score is also 1 is taken from the bottom-right corner of the table, here 56. Then divide this by the total number of instances that were scored as anomalous, which is the sum of the second column, 3 and 56. The resulting precision is approximately 0 point 95.

6. Let's practice!

Let's practice using these measurement tools to compare the performance of two anomaly detection techniques!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.