Get startedGet started for free

Find the outlaw... Outlier!

As an example of the summary measure approach we will look into the post treatment values of the BPRS. The mean of weeks 1 to 8 will be our summary measure. First calculate this measure and then look at boxplots of the measure for each treatment group. See how the mean summary measure is more variable in the second treatment group and its distribution in this group is somewhat skew. The boxplot of the second group also reveals an outlier, a subject whose mean BPRS score of the eight weeks is over 70. It might bias the conclusions from further comparisons of the groups, so we shall remove that subject from the data. Without the outlier, try to figure which treatment group might have the lower the eight-week mean. Think, considering the variation, how can we be sure?

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

  • Create the summary data BPRSL8S
  • Glimpse the data
  • Draw the boxplot and observe the outlier
  • Find a suitable threshold value and use filter() to exclude the outlier to form a new data BPRSL8S1
  • Glimpse and draw a boxplot of the new data to check the outlier has been dealt with

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# dplyr, tidyr & ggplot2 packages and BPRSL are available

# Create a summary data by treatment and subject with mean as the summary variable (ignoring baseline week 0).
BPRSL8S <- BPRSL %>%
  filter(week > 0) %>%
  group_by(treatment, subject) %>%
  summarise( mean=mean(bprs) ) %>%
  ungroup()

# Glimpse the data
glimpse(BPRSL8S)

# Draw a boxplot of the mean versus treatment
ggplot(BPRSL8S, aes(x = treatment, y = mean)) +
  geom_boxplot() +
  stat_summary(fun.y = "mean", geom = "point", shape=23, size=4, fill = "white") +
  scale_y_continuous(name = "mean(bprs), weeks 1-8")

# Create a new data by filtering the outlier and adjust the ggplot code the draw the plot again with the new data
BPRSL8S1 <- "Change me!"
Edit and Run Code