Evaluate a modeling procedure using n-fold cross-validation
In this exercise, you will use splitPlan, the 3-fold cross validation plan from the previous exercise, to make predictions from a model that predicts mpg$cty from mpg$hwy.
If dframe is the training data, then one way to add a column of cross-validation predictions to the frame is as follows:
# Initialize a column of the appropriate length
dframe$pred.cv <- 0 
# k is the number of folds
# splitPlan is the cross validation plan
for(i in 1:k) {
  # Get the ith split
  split <- splitPlan[[i]]
  # Build a model on the training data 
  # from this split 
  # (lm, in this case)
  model <- lm(fmla, data = dframe[split$train,])
  # make predictions on the 
  # application data from this split
  dframe$pred.cv[split$app] <- predict(model, newdata = dframe[split$app,])
}
Cross-validation predicts how well a model built from all the data will perform on new data. As with the test/train split, for a good modeling procedure, cross-validation performance and training performance should be close.
The data frame mpg, the cross validation plan splitPlan, and the rmse() function have been pre-loaded.
Este exercício faz parte do curso
Supervised Learning in R: Regression
Instruções do exercício
- Run the 3-fold cross validation plan from splitPlanand put the predictions in the columnmpg$pred.cv.- Use lm()and the formulacty ~ hwy.
 
- Use 
- Create a linear regression model on all the mpgdata (formulacty ~ hwy) and assign the predictions tompg$pred.
- Use rmse()to get the root mean squared error of the predictions from the full model (mpg$pred). Recall thatrmse()takes two arguments, the predicted values, and the actual outcome.
- Get the root mean squared error of the cross-validation predictions. Are the two values about the same?
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# mpg is available
summary(mpg)
# splitPlan is available
str(splitPlan)
# Run the 3-fold cross validation plan from splitPlan
k <- ___ # Number of folds
mpg$pred.cv <- 0 
for(i in ___) {
  split <- ___
  model <- lm(___, data = ___)
  mpg$pred.cv[___] <- predict(___, newdata = ___)
}
# Predict from a full model
mpg$pred <- ___(___(cty ~ hwy, data = mpg))
# Get the rmse of the full model's predictions
___(___, ___)
# Get the rmse of the cross-validation predictions
___(___, ___)