1. Learn
  2. /
  3. Courses
  4. /
  5. Supervised Learning in R: Regression

Exercise

Evaluate a model using test/train split

Now you will test the model mpg_model on the test data, mpg_test. Functions rmse() and r_squared() to calculate RMSE and R-squared have been provided for convenience:

rmse(predcol, ycol)
r_squared(predcol, ycol)

where:

  • predcol: The predicted values
  • ycol: The actual outcome

You will also plot the predictions vs. the outcome.

Generally, model performance is better on the training data than the test data (though sometimes the test set "gets lucky"). A slight difference in performance is okay; if the performance on training is significantly better, there is a problem.

The mpg_train and mpg_test data frames, and the mpg_model model have been pre-loaded, along with the functions rmse() and r_squared().

Instructions

100 XP
  • Predict city fuel efficiency from hwy on the mpg_train data. Assign the predictions to the column pred.
  • Predict city fuel efficiency from hwy on the mpg_test data. Assign the predictions to the column pred.
  • Use rmse() to evaluate RMSE for both the test and training sets. Compare. Are the performances similar?
  • Do the same with r_squared(). Are the performances similar?
  • Use ggplot2 to plot the predictions against cty on the test data.