Handle outliers with standard deviation
Given a basetable that has one variable "age". The age is manually filled out in an online form by the donor and is therefore prone to typing errors and can have outliers. Replace all values that are lower than the mean age minus 3 times the standard deviation of age by this value, and replace all values that are higher than the mean age plus 3 times the standard deviation of age by this value.
Deze oefening maakt deel uit van de cursus
Intermediate Predictive Analytics in Python
Oefeninstructies
- Print the maximum value of "age".
- Calculate the mean and standard deviation of "age".
- Calculate the lower and upper limits using the standard deviation rule of thumb.
- Add a variable "age_mod" to the basetable with outliers replaced, and print the new maximum value of "age _mod".
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# Show the maximum age
print(___["___"].___())
# Calculate mean and standard deviation of age
mean_age = ____["____"].____()
std_age = ____["____"].____()
# Calculate the lower and upper limits
lower_limit = ____ - ____ * ____
upper_limit = ____ + ____ * ____
# Add a variable age_no_outliers to the basetable with outliers replaced
basetable["age_mod"] = (pd.Series([____(____(____, ____), ____)
for a in basetable["age"]]))
print(___["___"].___())