Session Ready
Exercise

Putting it All Together with KittyCatch: Part 1 - Explore the Data

Often, our experiments don't go as planned. This long and complex exercise will test your knowledge in resolving a question with realistic problems.

KittyCatch is a location-based augmented reality game developed by Meowtec. In this game, players use their cell phones' GPS abilities to locate and capture virtual kittens. Although KittyCatch is free to play, Meowtec generates revenue from KittyCatch through pop-up ads, which appear about once for every half-mile the player walks while using the application, up to 5 times a day. Since Meowtec's revenue is highly dependent on how far people walk while playing KittyCatch, their primary goal for future updates to KittyCatch is to incentivize play.

One strategy to increase users' play time is to increase the distance between KittyCatch's "Points of Kinterest"; that is, increase the distance that users must travel to capture new kittens. Many players stop their sessions of KittyCatch before they catch a single kitten, and may be turned off by the need to travel farther to catch a kitten. However, Meowtec believes that the loss of play time for these players would be offset by the many other players who consistently travel as far as needed to catch a kitten.

To test this hypothesis, Meowtec constructs an experiment: For a small random sample of players in Springfield, Massachusetts, Meowtec increases the distance that players must walk to reach a Point of Kinterest (specifically, from .25 miles to .5 miles). They compare these players to an equal random sample of players in Springfield, Massachusetts that have the default distance between Points of Kinterest (.25 miles). However, as they explore their data, they find a few problems with their experiment.

With the dataset, KittyCatch, and the annotations in the R workspace, help Meowtec determine a valid and reliable average treatment effect from their experiment. Specifically, balance the data and correct for any coding errors in the original dataset to estimate a reliable average treatment effect. We will spend many more steps on this problem than in previous exercises, but also more representative of the types of steps that data scientists use when working with experimental data. But first, we need to get familiar with the data.

Instructions
100 XP
  • 1) Take a look at the variable names and dataframe structure.
  • 2) See what a "naive" treatment effect looks like.
  • 3) Run a t-test to see a different early calculation of the treatment effect.
  • 4) Examine the values of our outcome of interest, the distance the users walk.