Session Ready
Exercise

The maximum error rate

The 10,000 runners in Runners can't all start at the same time. They line up behind the start (the line-up goes for about half a mile). There is a handful of elite runners who are given spots right at the start line, but everyone else gets in line.

The start_position variable categorizes the enthusiasm of the runners based on how close they maneuvered to the start line before the gun. The variable is categorical, with levels "calm", "eager", and "mellow". The context for this exercise is whether other variables in Runners can account for start_position. Since the response variable start_position is categorical, rpart() is an appropriate architecture.

In this exercise, you'll investigate the prediction performance of what is sometimes called the null model. This is a model with no explanatory variables, the equivalent to "I don't know what might explain that." The output of the null model will be the same for every input.

You might think that random guessing of the output would be just about the same as the output of the null model. So you'll also look at the prediction performance of random guessing.

Instructions
100 XP
  • Construct the null model with start_position as the response variable.
  • Evaluate that model to get the outputs for each case. Note the type = "class" argument, which sets the format of the model output to be the levels from the response variable.
  • Calculate the error rate by comparing start_position to model_output.
  • Construct a set of random guesses. The command to do this, based on shuffle(), is provided in the editor.
  • Calculate the error rate by comparing start_position to the random guess.
  • Note that random guessing does not perform as well as the null model.