Get startedGet started for free

Handle outliers with standard deviation

Given a basetable that has one variable "age". The age is manually filled out in an online form by the donor and is therefore prone to typing errors and can have outliers. Replace all values that are lower than the mean age minus 3 times the standard deviation of age by this value, and replace all values that are higher than the mean age plus 3 times the standard deviation of age by this value.

This exercise is part of the course

Intermediate Predictive Analytics in Python

View Course

Exercise instructions

  • Print the maximum value of "age".
  • Calculate the mean and standard deviation of "age".
  • Calculate the lower and upper limits using the standard deviation rule of thumb.
  • Add a variable "age_mod" to the basetable with outliers replaced, and print the new maximum value of "age _mod".

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Show the maximum age 
print(___["___"].___())

# Calculate mean and standard deviation of age
mean_age = ____["____"].____()
std_age = ____["____"].____()

# Calculate the lower and upper limits
lower_limit = ____ - ____ * ____
upper_limit = ____ + ____ * ____

# Add a variable age_no_outliers to the basetable with outliers replaced
basetable["age_mod"] = (pd.Series([____(____(____, ____), ____) 
                             for a in basetable["age"]]))
print(___["___"].___())
Edit and Run Code