IniziaInizia gratis

Use KNN imputation

In the previous exercise, you used median imputation to fill in missing values in the breast cancer dataset, but that is not the only possible method for dealing with missing data.

An alternative to median imputation is k-nearest neighbors, or KNN, imputation. This is a more advanced form of imputation where missing values are replaced with values from other rows that are similar to the current row. While this is a lot more complicated to implement in practice than simple median imputation, it is very easy to explore in caret using the preProcess argument to train(). You can simply use preProcess = "knnImpute" to change the method of imputation used prior to model fitting.

Questo esercizio fa parte del corso

Machine Learning with caret in R

Visualizza il corso

Istruzioni dell'esercizio

breast_cancer_x and breast_cancer_y are loaded in your workspace.

  • Use the train() function to fit a glm model called knn_model to the breast cancer dataset.
  • Use KNN imputation to handle missing values.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Apply KNN imputation: knn_model
knn_model <- train(
  x = ___, 
  y = ___,
  method = ___,
  trControl = myControl,
  preProcess = ___
)

# Print knn_model to console
Modifica ed esegui il codice