Predict on test set
Now that you have a randomly split training set and test set, you can use the lm()
function as you did in the first exercise to fit a model to your training set, rather than the entire dataset. Recall that you can use the formula interface to the linear regression function to fit a model with a specified target variable using all other variables in the dataset as predictors:
mod <- lm(y ~ ., training_data)
You can use the predict()
function to make predictions from that model on new data. The new dataset must have all of the columns from the training data, but they can be in a different order with different values. Here, rather than re-predicting on the training set, you can predict on the test set, which you did not use for training the model. This will allow you to determine the out-of-sample error for the model in the next exercise:
p <- predict(model, new_data)
This is a part of the course
“Machine Learning with caret in R”
Exercise instructions
- Fit an
lm()
model calledmodel
to predictprice
using all other variables as covariates. Be sure to use the training set,train
. - Predict on the test set,
test
, usingpredict()
. Store these values in a vector calledp
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit lm model on train: model
# Predict on test: p
This exercise is part of the course
Machine Learning with caret in R
This course teaches the big ideas in machine learning like how to build and evaluate predictive models.
In the first chapter of this course, you'll fit regression models with <code>train()</code> and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).
Exercise 1: Welcome to the courseExercise 2: In-sample RMSE for linear regressionExercise 3: In-sample RMSE for linear regression on diamondsExercise 4: Out-of-sample error measuresExercise 5: Out-of-sample RMSE for linear regressionExercise 6: Randomly order the data frameExercise 7: Try an 80/20 splitExercise 8: Predict on test setExercise 9: Calculate test set RMSE by handExercise 10: Comparing out-of-sample RMSE to in-sample RMSEExercise 11: Cross-validationExercise 12: Advantage of cross-validationExercise 13: 10-fold cross-validationExercise 14: 5-fold cross-validationExercise 15: 5 x 5-fold cross-validationExercise 16: Making predictions on new dataWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.