Aan de slagGa gratis aan de slag

Initializing missing values & iterating over variables

As you have just seen, running impute_lm() might not fill-in all the missing values. To ensure you impute all of them, you should initialize the missing values with a simple method, such as the hot-deck imputation you learned about in the previous chapter, which simply feeds forward the last observed value.

Moreover, a single imputation is usually not enough. It is based on the basic initialized values and could be biased. A proper approach is to iterate over the variables, imputing them one at a time in the locations where they were originally missing.

In this exercise, you will first initialize the missing values with hot-deck imputation and then loop five times over air_temp and humidity from the tao data to impute them with linear regression. Let's get to it!

Deze oefening maakt deel uit van de cursus

Handling Missing Data with Imputations in R

Cursus bekijken

Oefeninstructies

  • Initialize the missing values with the hotdeck() imputation.
  • Create a boolean mask for where humidity was originally missing and assign it to missing_humidity.
  • Inside the for-loop, set the humidity in tao_imp to NA in places where it was originally missing using the boolean mask you have created.
  • Inside the for-loop, impute humidity in tao_imp with linear regression, using year, latitude, sea_surface_temp and air_temp as predictors and re-assign the result to tao_imp.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Initialize missing values with hot-deck
tao_imp <- ___(tao)

# Create boolean masks for where air_temp and humidity are missing
missing_air_temp <- tao_imp$air_temp_imp
missing_humidity <- ___

for (i in 1:5) {
  # Set air_temp to NA in places where it was originally missing and re-impute it
  tao_imp$air_temp[missing_air_temp] <- NA
  tao_imp <- impute_lm(tao_imp, air_temp ~ year + latitude + sea_surface_temp + humidity)
  # Set humidity to NA in places where it was originally missing and re-impute it
  tao_imp$humidity[___] <- ___
  tao_imp <- ___(___, ___ ~ year + latitude + sea_surface_temp + ___)
}
Code bewerken en uitvoeren