Choosing the number of neighbors
k-Nearest-Neighbors (or kNN) imputation fills the missing values in an observation based on the values coming from the k other observations that are most similar to it. The number of these similar observations, called neighbors, that are considered is a parameter that has to be chosen beforehand.
How to choose k? One way is to try different values and see how they impact the relations between the imputed and observed data.
Let's try imputing humidity in the tao data using three different values of k and see how the imputed values fit the relation between humidity and sea_surface_temp.
Cet exercice fait partie du cours
Handling Missing Data with Imputations in R
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Impute humidity using 30 neighbors
tao_imp <- ___(tao, k = ___, variable = ___)
# Draw a margin plot of sea_surface_temp vs humidity
tao_imp %>%
select(sea_surface_temp, humidity, humidity_imp) %>%
___(delimiter = "imp", main = "k = 30")