MulaiMulai sekarang secara gratis

Mean-imputing the temperature

Mean imputation can be a risky business. If the variable you are mean-imputing is correlated with other variables, this correlation might be destroyed by the imputed values. You saw it looming in the previous exercise when you analyzed the air_temp variable.

To find out whether these concerns are valid, in this exercise you will perform mean imputation on air_temp, while also creating a binary indicator for where the values are imputed. It will come in handy in the next exercise, when you will be assessing your imputation's performance. Let's fill in those missing values!

Latihan ini adalah bagian dari kursus

Handling Missing Data with Imputations in R

Lihat Kursus

Petunjuk latihan

  • In the pipeline modifying tao, create a new variable called air_temp_imp that is TRUE if air_temp is missing and FALSE otherwise.
  • Later in the same pipeline, overwrite air_temp with its own mean whenever it is missing and leave it untouched otherwise, assigning the result to tao_imp.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

tao_imp <- tao %>% 
  # Create a binary indicator for missing values in air_temp
  ___(air_temp_imp = ifelse(___(___), ___, ___)) %>% 
  # Impute air_temp with its mean
  ___(air_temp = ifelse(___(___), ___(___, na.rm = ___), ___))

# Print the first 10 rows of tao_imp
head(tao_imp, 10)
Edit dan Jalankan Kode