Finding outliers with z-scores
The normal distribution is ubiquitous in the natural world and is the most common distribution. This is why the z-score method can be one of the quickest methods for detecting outliers.
Recall the rule of thumb from the video: if a sample is more than three standard away deviations from the mean, you can consider it an extreme value.
However, recall also that the z-score method should be approached with caution. This method is appropriate only when we are confident our data comes from a normal distribution. Otherwise, the results might be misleading.
The prices distribution has been loaded for you.
This exercise is part of the course
Anomaly Detection in Python
Exercise instructions
- Import the zscorefunction from the relevantscipymodule.
- Find the z-scores of pricesand store them intoscores.
- Create a boolean mask named is_over_3to check if the absolute values ofscoresare greater than 3.
- Use the mask to filter pricesfor outliers.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the zscores function
from scipy.____ import ____
# Find the zscores of prices
scores = ____(____)
# Check if the absolute values of scores are over 3
is_over_3 = ____
# Use the mask to subset prices
outliers = ____[____]
print(len(outliers))