Finding outliers with z-scores
The normal distribution is ubiquitous in the natural world and is the most common distribution. This is why the z-score method can be one of the quickest methods for detecting outliers.
Recall the rule of thumb from the video: if a sample is more than three standard away deviations from the mean, you can consider it an extreme value.
However, recall also that the z-score method should be approached with caution. This method is appropriate only when we are confident our data comes from a normal distribution. Otherwise, the results might be misleading.
The prices
distribution has been loaded for you.
Diese Übung ist Teil des Kurses
Anomaly Detection in Python
Anleitung zur Übung
- Import the
zscore
function from the relevantscipy
module. - Find the z-scores of
prices
and store them intoscores
. - Create a boolean mask named
is_over_3
to check if the absolute values ofscores
are greater than 3. - Use the mask to filter
prices
for outliers.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Import the zscores function
from scipy.____ import ____
# Find the zscores of prices
scores = ____(____)
# Check if the absolute values of scores are over 3
is_over_3 = ____
# Use the mask to subset prices
outliers = ____[____]
print(len(outliers))