Mean-imputing the temperature
Mean imputation can be a risky business. If the variable you are mean-imputing is correlated with other variables, this correlation might be destroyed by the imputed values. You saw it looming in the previous exercise when you analyzed the air_temp
variable.
To find out whether these concerns are valid, in this exercise you will perform mean imputation on air_temp
, while also creating a binary indicator for where the values are imputed. It will come in handy in the next exercise, when you will be assessing your imputation's performance. Let's fill in those missing values!
This exercise is part of the course
Handling Missing Data with Imputations in R
Exercise instructions
- In the pipeline modifying
tao
, create a new variable calledair_temp_imp
that isTRUE
ifair_temp
is missing andFALSE
otherwise. - Later in the same pipeline, overwrite
air_temp
with its own mean whenever it is missing and leave it untouched otherwise, assigning the result totao_imp
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
tao_imp <- tao %>%
# Create a binary indicator for missing values in air_temp
___(air_temp_imp = ifelse(___(___), ___, ___)) %>%
# Impute air_temp with its mean
___(air_temp = ifelse(___(___), ___(___, na.rm = ___), ___))
# Print the first 10 rows of tao_imp
head(tao_imp, 10)