Session Ready
Exercise

Putting it All Together with KittyCatch: Part 2 - Use Graphs to Understand the Outcome

As we saw in the previous exercise, the maximum value for our outcome of interest - the distance users walked - is much larger than the mean, median, and 3rd quartile of the dataset. This outlier might bias our t-test results by violating its assumption that the data is normally distributed, so we need to deal with outliers for this variable. Let's create some charts to see the current distribution of DistanceWalked, and then let's try a method to deal with the outliers and make another chart to see if our method makes our data look more like a normal distribution.

Instructions
100 XP
  • 1) Chart the values of our outcome of interest.
  • 2) Examine the highest distances that our sample users walked.
  • 3) Use top coding to help us handle outlier values of DistanceWalked.
  • 4) Create a new chart to see if Step 3 helps make our values look more normal.