Identify outliers
Consider the distribution, shown here, of the life expectancies of the countries in Asia. The box plot identifies one clear outlier: a country with a notably low life expectancy. Do you have a guess as to which country this might be? Test your guess in the console using either min()
or filter()
, then proceed to building a plot with that country removed.
This exercise is part of the course
Exploratory Data Analysis in R
Exercise instructions
gap2007
is still available in your workspace.
- Apply a filter so that it only contains observations from Asia, then create a new variable called
is_outlier
that isTRUE
for countries with life expectancy less than 50. Assign the result togap_asia
. - Filter
gap_asia
to remove all outliers, then create another box plot of the remaining life expectancies.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Filter for Asia, add column indicating outliers
gap_asia <- ___ %>%
filter(___) %>%
mutate(___ = ___)
# Remove outliers, create box plot of lifeExp
gap_asia %>%
filter(___) %>%
ggplot(aes(x = ___, y = ___)) +
___