Predict on test set
Now that you have a randomly split training set and test set, you can use the lm()
function as you did in the first exercise to fit a model to your training set, rather than the entire dataset. Recall that you can use the formula interface to the linear regression function to fit a model with a specified target variable using all other variables in the dataset as predictors:
mod <- lm(y ~ ., training_data)
You can use the predict()
function to make predictions from that model on new data. The new dataset must have all of the columns from the training data, but they can be in a different order with different values. Here, rather than re-predicting on the training set, you can predict on the test set, which you did not use for training the model. This will allow you to determine the out-of-sample error for the model in the next exercise:
p <- predict(model, new_data)
This is a part of the course
“Machine Learning with caret in R”
Exercise instructions
- Fit an
lm()
model calledmodel
to predictprice
using all other variables as covariates. Be sure to use the training set,train
. - Predict on the test set,
test
, usingpredict()
. Store these values in a vector calledp
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit lm model on train: model
# Predict on test: p