LoslegenKostenlos loslegen

Finding outliers with z-scores

The normal distribution is ubiquitous in the natural world and is the most common distribution. This is why the z-score method can be one of the quickest methods for detecting outliers.

Recall the rule of thumb from the video: if a sample is more than three standard away deviations from the mean, you can consider it an extreme value.

However, recall also that the z-score method should be approached with caution. This method is appropriate only when we are confident our data comes from a normal distribution. Otherwise, the results might be misleading.

The prices distribution has been loaded for you.

Diese Übung ist Teil des Kurses

Anomaly Detection in Python

Kurs anzeigen

Anleitung zur Übung

  • Import the zscore function from the relevant scipy module.
  • Find the z-scores of prices and store them into scores.
  • Create a boolean mask named is_over_3 to check if the absolute values of scores are greater than 3.
  • Use the mask to filter prices for outliers.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Import the zscores function
from scipy.____ import ____

# Find the zscores of prices
scores = ____(____)

# Check if the absolute values of scores are over 3
is_over_3 = ____

# Use the mask to subset prices
outliers = ____[____]

print(len(outliers))
Code bearbeiten und ausführen