Handle outliers with winsorization
Given is a basetable with two variables: "sum\_donations" and "donor\_id". "sum_donations can contain outliers when donors have donated exceptional amounts. Therefore, you want to winsorize this variable such that the 5% highest amounts are replaced by the upper 5% percentile value.
Diese Übung ist Teil des Kurses
Intermediate Predictive Analytics in Python
Anleitung zur Übung
- Print the minimum value of
sum_donationsand verify that it is at least 0. Then print the maximum value ofsum_donations. - Fill out the appropriate lower limit percentile. As all values higher than 0 are realistic and occur often, it is not necessary to replace values lower than the lower limit percentile value.
- Create a new variable "sum_donations_winsorized" that is a winsorized version of the "sum_donations" variable.
- Print the maximum value of
sum_donations_winsorized.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
from scipy.stats.mstats import winsorize
# Check minimum sum of donations
print(____["____"].____())
print(____["____"].____())
# Fill out the lower limit
lower_limit = ____
# Winsorize the variable sum_donations
basetable["sum_donations_winsorized"] = ____(____["____"], limits=[lower_limit, 0.05])
# Check maximum sum of donations after winsorization
print(____["____"].____())