Handle outliers with standard deviation
Given a basetable
that has one variable "age". The age is manually filled out in an online form by the donor and is therefore prone to typing errors and can have outliers. Replace all values that are lower than the mean age minus 3 times the standard deviation of age by this value, and replace all values that are higher than the mean age plus 3 times the standard deviation of age by this value.
Diese Übung ist Teil des Kurses
Intermediate Predictive Analytics in Python
Anleitung zur Übung
- Print the maximum value of "age".
- Calculate the mean and standard deviation of "age".
- Calculate the lower and upper limits using the standard deviation rule of thumb.
- Add a variable "age_mod" to the basetable with outliers replaced, and print the new maximum value of "age _mod".
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Show the maximum age
print(___["___"].___())
# Calculate mean and standard deviation of age
mean_age = ____["____"].____()
std_age = ____["____"].____()
# Calculate the lower and upper limits
lower_limit = ____ - ____ * ____
upper_limit = ____ + ____ * ____
# Add a variable age_no_outliers to the basetable with outliers replaced
basetable["age_mod"] = (pd.Series([____(____(____, ____), ____)
for a in basetable["age"]]))
print(___["___"].___())