Get startedGet started for free

Combining preprocessing methods

The preProcess argument to train() doesn't just limit you to imputing missing values. It also includes a wide variety of other preProcess techniques to make your life as a data scientist much easier. You can read a full list of them by typing ?preProcess and reading the help page for this function.

One set of preprocessing functions that is particularly useful for fitting regression models is standardization: centering and scaling. You first center by subtracting the mean of each column from each value in that column, then you scale by dividing by the standard deviation.

Standardization transforms your data such that for each column, the mean is 0 and the standard deviation is 1. This makes it easier for regression models to find a good solution.

This exercise is part of the course

Machine Learning with caret in R

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit glm with median imputation
model <- train(
  x = ___, 
  y = ___,
  method = ___,
  trControl = myControl,
  preProcess = ___
)

# Print model
Edit and Run Code