Session Ready
Exercise

Tidying up

Before you get started with cross validation, take a moment to tidy up your R commands. The term tidy programming refers to a style where variables are always kept as part of a data frame and the functions always take a data frame as an input.

Let's look at some examples of the (untidy?) style that you have been using up to now, alongside a tidy version that accomplishes the same thing.

Model training is intrinsically tidy. predict() is untidy, since its output is not part of a data frame:

mod <- lm(net ~ age + sex, data = Runners)
out <- predict(mod, newdata = Runners)

mean() (the base R version) is also untidy, since you have to use $ to extract a variable from a data frame:

mean((Runners$net - out)^2, na.rm = TRUE)

Here's a tidier way to predict and calculate MSE:

out2 <- evaluate_model(mod, data = Runners)
with(data = out2, mean((net - model_output) ^ 2, na.rm = TRUE))

Thanks to the evaluate_model() function from the statisticalModeling package, out2 is a data frame containing both inputs and the corresponding outputs, side-by-side. This will replace the untidy predict() for the remainder of the course. with() avoids the need to use the untidy $.

The statisticalModeling package and Runners dataset are loaded, so you can run the code above in your console. What is the name of the variable that holds the model output in the data frame produced by evaluate_model()?

Instructions
50 XP
Possible Answers