Get startedGet started for free

Use KNN imputation

In the previous exercise, you used median imputation to fill in missing values in the breast cancer dataset, but that is not the only possible method for dealing with missing data.

An alternative to median imputation is k-nearest neighbors, or KNN, imputation. This is a more advanced form of imputation where missing values are replaced with values from other rows that are similar to the current row. While this is a lot more complicated to implement in practice than simple median imputation, it is very easy to explore in caret using the preProcess argument to train(). You can simply use preProcess = "knnImpute" to change the method of imputation used prior to model fitting.

This exercise is part of the course

Machine Learning with caret in R

View Course

Exercise instructions

breast_cancer_x and breast_cancer_y are loaded in your workspace.

  • Use the train() function to fit a glm model called knn_model to the breast cancer dataset.
  • Use KNN imputation to handle missing values.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Apply KNN imputation: knn_model
knn_model <- train(
  x = ___, 
  y = ___,
  method = ___,
  trControl = myControl,
  preProcess = ___
)

# Print knn_model to console
Edit and Run Code