Get startedGet started for free

Choosing the number of neighbors

k-Nearest-Neighbors (or kNN) imputation fills the missing values in an observation based on the values coming from the k other observations that are most similar to it. The number of these similar observations, called neighbors, that are considered is a parameter that has to be chosen beforehand.

How to choose k? One way is to try different values and see how they impact the relations between the imputed and observed data.

Let's try imputing humidity in the tao data using three different values of k and see how the imputed values fit the relation between humidity and sea_surface_temp.

This exercise is part of the course

Handling Missing Data with Imputations in R

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Impute humidity using 30 neighbors
tao_imp <- ___(tao, k = ___, variable = ___)

# Draw a margin plot of sea_surface_temp vs humidity
tao_imp %>% 
	select(sea_surface_temp, humidity, humidity_imp) %>% 
	___(delimiter = "imp", main = "k = 30")
Edit and Run Code