Compare KNN and median imputation

All of the preprocessing steps in the train() function happen in the training set of each cross-validation fold, so the error metrics reported include the effects of the preprocessing.

This includes the imputation method used (e.g. knnImpute or medianImpute). This is useful because it allows you to compare different methods of imputation and choose the one that performs the best out-of-sample.

median_model and knn_model are available in your workspace, as is resamples, which contains the resampled results of both models. Look at the results of the models by calling

dotplot(resamples, metric = "ROC")

and choose the one that performs the best out-of-sample. Which method of imputation yields the highest out-of-sample ROC score for your glm model?

This exercise is part of the course

Machine Learning with caret in R

View Course

Hands-on interactive exercise

Turn theory into action with one of our interactive exercises

Start Exercise