CommencerCommencer gratuitement

Finding outliers with z-scores

The normal distribution is ubiquitous in the natural world and is the most common distribution. This is why the z-score method can be one of the quickest methods for detecting outliers.

Recall the rule of thumb from the video: if a sample is more than three standard away deviations from the mean, you can consider it an extreme value.

However, recall also that the z-score method should be approached with caution. This method is appropriate only when we are confident our data comes from a normal distribution. Otherwise, the results might be misleading.

The prices distribution has been loaded for you.

Cet exercice fait partie du cours

Anomaly Detection in Python

Afficher le cours

Instructions

  • Import the zscore function from the relevant scipy module.
  • Find the z-scores of prices and store them into scores.
  • Create a boolean mask named is_over_3 to check if the absolute values of scores are greater than 3.
  • Use the mask to filter prices for outliers.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Import the zscores function
from scipy.____ import ____

# Find the zscores of prices
scores = ____(____)

# Check if the absolute values of scores are over 3
is_over_3 = ____

# Use the mask to subset prices
outliers = ____[____]

print(len(outliers))
Modifier et exécuter le code