Handle outliers with standard deviation
Given a basetable
that has one variable "age". The age is manually filled out in an online form by the donor and is therefore prone to typing errors and can have outliers. Replace all values that are lower than the mean age minus 3 times the standard deviation of age by this value, and replace all values that are higher than the mean age plus 3 times the standard deviation of age by this value.
Este ejercicio forma parte del curso
Intermediate Predictive Analytics in Python
Instrucciones del ejercicio
- Print the maximum value of "age".
- Calculate the mean and standard deviation of "age".
- Calculate the lower and upper limits using the standard deviation rule of thumb.
- Add a variable "age_mod" to the basetable with outliers replaced, and print the new maximum value of "age _mod".
Ejercicio interactivo práctico
Prueba este ejercicio completando el código de muestra.
# Show the maximum age
print(___["___"].___())
# Calculate mean and standard deviation of age
mean_age = ____["____"].____()
std_age = ____["____"].____()
# Calculate the lower and upper limits
lower_limit = ____ - ____ * ____
upper_limit = ____ + ____ * ____
# Add a variable age_no_outliers to the basetable with outliers replaced
basetable["age_mod"] = (pd.Series([____(____(____, ____), ____)
for a in basetable["age"]]))
print(___["___"].___())