ComeçarComece de graça

Hot-deck tricks & tips I: imputing within domains

One trick that may help when hot-deck imputation breaks the relations between the variables is imputing within domains. What this means is that if the variable to be imputed is correlated with another, categorical variable, one can simply run hot-deck separately for each of its categories.

For instance, you might expect air temperature to depend on time, as we are seeing the average temperatures rising due to global warming. The time indicator you have available in the tao data is a categorical variable, year. Let's first check if the average air temperature is different in each of the two studied years and then run hot-deck within year domains. Finally, you will draw the margin plot again to assess the imputation performance.

Este exercício faz parte do curso

Handling Missing Data with Imputations in R

Ver curso

Instruções do exercício

  • Calculate mean air_temp for each year, calling the result average_air_temp while excluding NAs from the mean calculation.
  • Impute the missing values in air_temp in the tao data within year domains using hot-deck imputation and assign the result to tao_imp.
  • Create a margin plot of air_temp vs sea_surface_temp; remember to include air_temp_imp in the variables you pass to the plotting function.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Calculate mean air_temp per year
tao %>% 
	group_by(___) %>% 
	summarize(average_air_temp = mean(___, na.rm = ___))

# Hot-deck-impute air_temp in tao by year domain
tao_imp <- ___(___, variable = ___, ___ = ___)

# Draw a margin plot of air_temp vs sea_surface_temp
tao_imp %>% 
	select(___, ___, ___) %>% 
	marginplot(___ = ___)
Editar e executar o código