Exercise

Choosing the number of neighbors

k-Nearest-Neighbors (or kNN) imputation fills the missing values in an observation based on the values coming from the k other observations that are most similar to it. The number of these similar observations, called neighbors, that are considered is a parameter that has to be chosen beforehand.

How to choose k? One way is to try different values and see how they impact the relations between the imputed and observed data.

Let's try imputing humidity in the tao data using three different values of k and see how the imputed values fit the relation between humidity and sea_surface_temp.

Instructions 1/3

undefined XP
  • 1
    • Impute humidity with kNN imputation using 30 neighbors and draw a marginplot() of sea_surface_temp vs humidity.
  • 2
    • Impute humidity with kNN imputation using 15 neighbors and draw a margin plot of sea_surface_temp vs humidity.
  • 3
    • Impute humidity with kNN imputation using 5 neighbors and draw a margin plot of sea_surface_temp vs humidity.