Session Ready
Exercise

Repeating random trials

In the previous exercise, you implemented a cross validation trial. We call it a trial because it involves random assignment of cases to the training and testing sets. The result of the calculation was therefore (somewhat) random.

Since the result of cross validation varies from trial to trial, it's helpful to run many trials so that you can see how much variation there is. As you'll see, this will be a common process as you move through the course.

To simplify things, the cv_pred_error() function in the statisticalModeling package will carry out this repetitive process for you. All you need do is provide one or more models as input to cv_pred_error(); the function will do all the work of creating training and testing sets for each trial and calculating the mean square error for each trial. Easy!

The context for this exercise is to see whether the prediction error calculated from the training data is consistently different from the cross-validated prediction error. To that end, you'll calculate the in-sample error using only the training data. Then, you'll do the cross validation and use a t-test to see if the in-sample error is statistically different from the cross-validated error.

Run the following code in the console (it's okay to copy and paste):

# The model
model <- lm(net ~ age + sex, data = Runners_100)

# Find the in-sample error (using the training data)
in_sample <- evaluate_model(model, data = Runners_100)
in_sample_error <- 
  with(in_sample, mean((net - model_output)^2, na.rm = TRUE))

# Calculate MSE for many different trials
trials <- cv_pred_error(model)

# View the cross-validated prediction errors
trials

# Find confidence interval on trials and compare to training_error
mosaic::t.test(~ mse, mu = in_sample_error, data = trials)

The error based on the training data is ___ the 95% confidence interval representing the cross-validated prediction error.

Instructions
50 XP
Possible Answers