Compare KNN and median imputation
All of the preprocessing steps in the train()
function happen in the training set of each cross-validation fold, so the error metrics reported include the effects of the preprocessing.
This includes the imputation method used (e.g. knnImpute
or medianImpute
). This is useful because it allows you to compare different methods of imputation and choose the one that performs the best out-of-sample.
median_model
and knn_model
are available in your workspace, as is resamples
, which contains the resampled results of both models. Look at the results of the models by calling
dotplot(resamples, metric = "ROC")
and choose the one that performs the best out-of-sample. Which method of imputation yields the highest out-of-sample ROC score for your glm
model?
This exercise is part of the course
Machine Learning with caret in R
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
