BaşlayınÜcretsiz Başlayın

Hot-deck tricks & tips II: sorting by correlated variables

Another trick that can boost the performance of hot-deck imputation is sorting the data by variables correlated to the one we want to impute.

For instance, in all the margin plots you have been drawing recently, you have seen that air temperature is strongly correlated with sea surface temperature, which makes a lot of sense. You can exploit this knowledge to improve your hot-deck imputation. If you first order the data by sea_surface_temp, then every imputed air_temp value will come from a donor with a similar sea_surface_temp. Let's see how this will work!

Bu egzersiz

Handling Missing Data with Imputations in R

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Hot-deck-impute the missing values in air_temp in the tao data, ordering by sea_surface_temp and assign the result to tao_imp.
  • Create a margin plot of air_temp vs sea_surface_temp.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Hot-deck-impute air_temp in tao ordering by sea_surface_temp
tao_imp <- ___(___, ___ = ___, ___ = ___)

# Draw a margin plot of air_temp vs sea_surface_temp
tao_imp %>% 
	select(air_temp, sea_surface_temp, air_temp_imp) %>% 
	___(___ = ___)
Kodu Düzenle ve Çalıştır