Session Ready
Exercise

Robustness to outliers

Measures of central tendency attempt to describe the middle or center point of a distribution. In the presence of outliers, or extreme values, the median is preferred over the mean. The reason for this is that the mean can be "dragged" up or down by extreme values, but since the median is just the middle value in a distribution, it is not influenced by the outliers.

A person who does not like wine at all enters the wine ratings survey and makes a statement by giving the Shiraz the lowest possible score of zero. Let's see how it affects the mean and median of the score distribution.

Instructions
100 XP

We've made available to you both the original red_wine ratings as well as red_wine_extreme, which contains the original ratings plus the new extreme rating.

  • Calculate the change in mean rating after adding the new extreme value. Use the mean() function and save the result to diff_mean.
  • Calculate the change in median rating after adding the new extreme value. Use the median() function and save the result to diff_median.
  • Print both differences to see which measure of central tendency is least affected by the addition of the extreme rating.