Robustness to outliers
Measures of central tendency attempt to describe the middle or center point of a distribution. In the presence of outliers, or extreme values, the median is preferred over the mean. The reason for this is that the mean can be "dragged" up or down by extreme values, but since the median is just the middle value in a distribution, it is not influenced by the outliers.
A person who does not like wine at all enters the wine ratings survey and makes a statement by giving the Shiraz the lowest possible score of zero. Let's see how it affects the mean and median of the score distribution.
This exercise is part of the course
Intro to Statistics with R: Introduction
Exercise instructions
We've made available to you both the original red_wine
ratings as well as red_wine_extreme
, which contains the original ratings plus the new extreme rating.
- Calculate the change in mean rating after adding the new extreme value. Use the
mean()
function and save the result todiff_mean
. - Calculate the change in median rating after adding the new extreme value. Use the
median()
function and save the result todiff_median
. - Print both differences to see which measure of central tendency is least affected by the addition of the extreme rating.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Calculate the change in mean
diff_mean <- ___
# Calculate the change in median
diff_median <- ___
# Print both differences