Session Ready
Exercise

Will they run again?

In the previous two exercises, you built models of the net running time. The purpose for these models was imagined to be the construction of a handicapping system for comparing runners of different sexes and ages. By giving as inputs to the model the age and sex of a runner, the model produces an "expected running time." This becomes the handicap.

Now, let's imagine another possible purpose for a model: to predict whether or not a person who participated in the race this year will participate next year. For simplicity, a data frame Ran_twice has been created in your workspace. Ran_twice extracts all the people who had run the race two times and provides a variable, runs_again, that indicates if the person participated the next year (that is, in year three).

Predicting whether or not a person will run again next year is a very different purpose than finding a typical running time. The model to achieve that new purpose can be very different than in the previous two exercises. In particular:

  1. The output of this model will be TRUE or FALSE, indicating whether the person will participate next year. That is, the response variable will be runs_again.
  2. You can use variables like net running time as explanatory variables.
  3. The response variable runs_again is categorical, not numeric. Since lm() is intended for quantitative responses, you'll use only the rpart() architecture, which works for both numerical and categorical responses.
Instructions
100 XP

The rpart package and Ran_twice have already been loaded into your R session.

  • Using rpart() and data = Ran_twice, build a model with response runs_again and explanatory variables age, sex, and net running time. Set the complexity parameter to cp = 0.005. Store the model as run_again_model.
  • Visualize the model as a graph. The y-axis drawn will mark the probability that the outcome is TRUE.