Initializing missing values & iterating over variables
As you have just seen, running impute_lm() might not fill-in all the missing values. To ensure you impute all of them, you should initialize the missing values with a simple method, such as the hot-deck imputation you learned about in the previous chapter, which simply feeds forward the last observed value.
Moreover, a single imputation is usually not enough. It is based on the basic initialized values and could be biased. A proper approach is to iterate over the variables, imputing them one at a time in the locations where they were originally missing.
In this exercise, you will first initialize the missing values with hot-deck imputation and then loop five times over air_temp and humidity from the tao data to impute them with linear regression. Let's get to it!
Deze oefening maakt deel uit van de cursus
Handling Missing Data with Imputations in R
Oefeninstructies
- Initialize the missing values with the
hotdeck()imputation. - Create a boolean mask for where
humiditywas originally missing and assign it tomissing_humidity. - Inside the for-loop, set the
humidityintao_imptoNAin places where it was originally missing using the boolean mask you have created. - Inside the for-loop, impute
humidityintao_impwith linear regression, usingyear,latitude,sea_surface_tempandair_tempas predictors and re-assign the result totao_imp.
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# Initialize missing values with hot-deck
tao_imp <- ___(tao)
# Create boolean masks for where air_temp and humidity are missing
missing_air_temp <- tao_imp$air_temp_imp
missing_humidity <- ___
for (i in 1:5) {
# Set air_temp to NA in places where it was originally missing and re-impute it
tao_imp$air_temp[missing_air_temp] <- NA
tao_imp <- impute_lm(tao_imp, air_temp ~ year + latitude + sea_surface_temp + humidity)
# Set humidity to NA in places where it was originally missing and re-impute it
tao_imp$humidity[___] <- ___
tao_imp <- ___(___, ___ ~ year + latitude + sea_surface_temp + ___)
}