Exercise

# Evaluate a modeling procedure using n-fold cross-validation

In this exercise you will use `splitPlan`

, the 3-fold cross validation plan from the previous exercise, to make predictions from a model that predicts `mpg$cty`

from `mpg$hwy`

.

If `dframe`

is the training data, then one way to add a column of cross-validation predictions to the frame is as follows:

```
# Initialize a column of the appropriate length
dframe$pred.cv <- 0
# k is the number of folds
# splitPlan is the cross validation plan
for(i in 1:k) {
# Get the ith split
split <- splitPlan[[i]]
# Build a model on the training data
# from this split
# (lm, in this case)
model <- lm(fmla, data = dframe[split$train,])
# make predictions on the
# application data from this split
dframe$pred.cv[split$app] <- predict(model, newdata = dframe[split$app,])
}
```

Cross-validation predicts how well a model built from all the data will perform on new data. As with the test/train split, for a good modeling procedure, cross-validation performance and training performance should be close.

Instructions

**100 XP**

The data frame `mpg`

, the cross validation plan `splitPlan`

, and the function to calculate RMSE (`rmse()`

) from one of the previous exercises is available in your workspace.

- Run the 3-fold cross validation plan from
`splitPlan`

and put the predictions in the column`mpg$pred.cv`

.- Use
`lm()`

and the formula`cty ~ hwy`

.

- Use
- Create a linear regression model on all the
`mpg`

data (formula`cty ~ hwy`

) and assign the predictions to`mpg$pred`

. - Use
`rmse()`

to get the root mean squared error of the predictions from the full model (`mpg$pred`

).*Recall that*`rmse()`

takes two arguments, the predicted values, and the actual outcome. - Get the root mean squared error of the cross-validation predictions. Are the two values about the same?