Combining preprocessing methods
The preProcess
argument to train()
doesn't just limit you to imputing missing values. It also includes a wide variety of other preProcess
techniques to make your life as a data scientist much easier. You can read a full list of them by typing ?preProcess
and reading the help page for this function.
One set of preprocessing functions that is particularly useful for fitting regression models is standardization: centering and scaling. You first center by subtracting the mean of each column from each value in that column, then you scale by dividing by the standard deviation.
Standardization transforms your data such that for each column, the mean is 0 and the standard deviation is 1. This makes it easier for regression models to find a good solution.
This exercise is part of the course
Machine Learning with caret in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit glm with median imputation
model <- train(
x = ___,
y = ___,
method = ___,
trControl = myControl,
preProcess = ___
)
# Print model