ComeçarComece de graça

Finding outliers with z-scores

The normal distribution is ubiquitous in the natural world and is the most common distribution. This is why the z-score method can be one of the quickest methods for detecting outliers.

Recall the rule of thumb from the video: if a sample is more than three standard away deviations from the mean, you can consider it an extreme value.

However, recall also that the z-score method should be approached with caution. This method is appropriate only when we are confident our data comes from a normal distribution. Otherwise, the results might be misleading.

The prices distribution has been loaded for you.

Este exercício faz parte do curso

Anomaly Detection in Python

Ver curso

Instruções do exercício

  • Import the zscore function from the relevant scipy module.
  • Find the z-scores of prices and store them into scores.
  • Create a boolean mask named is_over_3 to check if the absolute values of scores are greater than 3.
  • Use the mask to filter prices for outliers.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Import the zscores function
from scipy.____ import ____

# Find the zscores of prices
scores = ____(____)

# Check if the absolute values of scores are over 3
is_over_3 = ____

# Use the mask to subset prices
outliers = ____[____]

print(len(outliers))
Editar e executar o código