ComeçarComece de graça

Choosing the number of neighbors

k-Nearest-Neighbors (or kNN) imputation fills the missing values in an observation based on the values coming from the k other observations that are most similar to it. The number of these similar observations, called neighbors, that are considered is a parameter that has to be chosen beforehand.

How to choose k? One way is to try different values and see how they impact the relations between the imputed and observed data.

Let's try imputing humidity in the tao data using three different values of k and see how the imputed values fit the relation between humidity and sea_surface_temp.

Este exercício faz parte do curso

Handling Missing Data with Imputations in R

Ver curso

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Impute humidity using 30 neighbors
tao_imp <- ___(tao, k = ___, variable = ___)

# Draw a margin plot of sea_surface_temp vs humidity
tao_imp %>% 
	select(sea_surface_temp, humidity, humidity_imp) %>% 
	___(delimiter = "imp", main = "k = 30")
Editar e executar o código