Evaluate a model using test/train split
Now you will test the model mpg_model
on the test data, mpg_test
.
Functions rmse()
and r_squared()
to calculate RMSE and R-squared have been provided for convenience:
rmse(predcol, ycol)
r_squared(predcol, ycol)
where:
- predcol: The predicted values
- ycol: The actual outcome
You will also plot the predictions vs. the outcome.
Generally, model performance is better on the training data than the test data (though sometimes the test set "gets lucky"). A slight difference in performance is okay; if the performance on training is significantly better, there is a problem.
The mpg_train
and mpg_test
data frames, and the mpg_model
model have been pre-loaded, along with the functions rmse()
and r_squared()
.
This exercise is part of the course
Supervised Learning in R: Regression
Exercise instructions
- Predict city fuel efficiency from
hwy
on thempg_train
data. Assign the predictions to the columnpred
. - Predict city fuel efficiency from
hwy
on thempg_test
data. Assign the predictions to the columnpred
. - Use
rmse()
to evaluate RMSE for both the test and training sets. Compare. Are the performances similar? - Do the same with
r_squared()
. Are the performances similar? - Use
ggplot2
to plot the predictions againstcty
on thetest
data.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Examine the objects that have been loaded
ls.str()
# predict cty from hwy for the training set
mpg_train$pred <- ___
# predict cty from hwy for the test set
mpg_test$pred <- ___
# Evaluate the rmse on both training and test data and print them
(rmse_train <- ___)
(rmse_test <- ___)
# Evaluate the r-squared on both training and test data.and print them
(rsq_train <- ___)
(rsq_test <- ___)
# Plot the predictions (on the x-axis) against the outcome (cty) on the test data
ggplot(___, aes(x = ___, y = ___)) +
geom_point() +
geom_abline()