Compare KNN and median imputation
All of the preprocessing steps in the train() function happen in the training set of each cross-validation fold, so the error metrics reported include the effects of the preprocessing.
This includes the imputation method used (e.g. knnImpute or medianImpute). This is useful because it allows you to compare different methods of imputation and choose the one that performs the best out-of-sample.
median_model and knn_model are available in your workspace, as is resamples, which contains the resampled results of both models. Look at the results of the models by calling
dotplot(resamples, metric = "ROC")
and choose the one that performs the best out-of-sample. Which method of imputation yields the highest out-of-sample ROC score for your glm model?
Deze oefening maakt deel uit van de cursus
Machine Learning with caret in R
Praktische interactieve oefening
Zet theorie om in actie met een van onze interactieve oefeningen.
Begin met trainen