Exercise

Combining preprocessing methods

The preProcess argument to train() doesn't just limit you to imputing missing values. It also includes a wide variety of other preProcess techniques to make your life as a data scientist much easier. You can read a full list of them by typing ?preProcess and reading the help page for this function.

One set of preprocessing functions that is particularly useful for fitting regression models is standardization: centering and scaling. You first center by subtracting the mean of each column from each value in that column, then you scale by dividing by the standard deviation.

Standardization transforms your data such that for each column, the mean is 0 and the standard deviation is 1. This makes it easier for regression models to find a good solution.

Instructions 1/2

undefined XP
  • 1

    breast_cancer_x and breast_cancer_y are loaded in your workspace. Fit a logistic regression model using median imputation called model to the breast cancer data, then print it to the console.

  • 2

    Update the model to include two more pre-processing steps: centering and scaling.