Exercise

Initializing missing values & iterating over variables

As you have just seen, running impute_lm() might not fill-in all the missing values. To ensure you impute all of them, you should initialize the missing values with a simple method, such as the hot-deck imputation you learned about in the previous chapter, which simply feeds forward the last observed value.

Moreover, a single imputation is usually not enough. It is based on the basic initialized values and could be biased. A proper approach is to iterate over the variables, imputing them one at a time in the locations where they were originally missing.

In this exercise, you will first initialize the missing values with hot-deck imputation and then loop five times over air_temp and humidity from the tao data to impute them with linear regression. Let's get to it!

Instructions

100 XP
  • Initialize the missing values with the hotdeck() imputation.
  • Create a boolean mask for where humidity was originally missing and assign it to missing_humidity.
  • Inside the for-loop, set the humidity in tao_imp to NA in places where it was originally missing using the boolean mask you have created.
  • Inside the for-loop, impute humidity in tao_imp with linear regression, using year, latitude, sea_surface_temp and air_temp as predictors and re-assign the result to tao_imp.