Use KNN imputation
In the previous exercise, you used median imputation to fill in missing values in the breast cancer dataset, but that is not the only possible method for dealing with missing data.
An alternative to median imputation is k-nearest neighbors, or KNN, imputation. This is a more advanced form of imputation where missing values are replaced with values from other rows that are similar to the current row. While this is a lot more complicated to implement in practice than simple median imputation, it is very easy to explore in caret
using the preProcess
argument to train()
. You can simply use preProcess = "knnImpute"
to change the method of imputation used prior to model fitting.
This exercise is part of the course
Machine Learning with caret in R
Exercise instructions
breast_cancer_x
and breast_cancer_y
are loaded in your workspace.
- Use the
train()
function to fit aglm
model calledknn_model
to the breast cancer dataset. - Use KNN imputation to handle missing values.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Apply KNN imputation: knn_model
knn_model <- train(
x = ___,
y = ___,
method = ___,
trControl = myControl,
preProcess = ___
)
# Print knn_model to console