Histograms for outlier detection
A histogram can be a compelling visual for finding outliers. They can become apparent when an appropriate number of bins is chosen for the histogram. Recall that the square root of the number of observations can be used as a rule of thumb for setting the number of bins. Usually, the bins with the lowest heights will contain outliers.
In this exercise, you'll plot the histogram of prices
from the previous exercise. numpy
and matplotlib.pyplot
are available under their standard aliases.
This is a part of the course
“Anomaly Detection in Python”
Exercise instructions
- Find the square root of the length of
prices
and store it asn_bins
. - Cast
n_bins
to an integer. - Create a histogram of
prices
, setting the number of bins ton_bins
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Find the square root of the length of prices
n_bins = ____
# Cast to an integer
n_bins = ____(____)
plt.figure(figsize=(8, 4))
# Create a histogram
plt.____(____, ____=____, color='red')
plt.show()
This exercise is part of the course
Anomaly Detection in Python
Detect anomalies in your data analysis and expand your Python statistical toolkit in this four-hour course.
This chapter covers techniques to detect outliers in 1-dimensional data using histograms, scatterplots, box plots, z-scores, and modified z-scores.
Exercise 1: What are anomalies and outliers?Exercise 2: Print a 5-number summaryExercise 3: Histograms for outlier detectionExercise 4: Scatterplots for outlier detectionExercise 5: Box plots and IQRExercise 6: Boxplots for outlier detectionExercise 7: Calculating outlier limits with IQRExercise 8: Using outlier limits for filteringExercise 9: Using z-scores for Anomaly DetectionExercise 10: Finding outliers with z-scoresExercise 11: Using modified z-scores with PyODWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.