Model-based imputation with multiple variable types

Great job on writing the function to implement logistic regression imputation with drawing from conditional distribution. That's pretty advanced statistics you have coded! In this exercise, you will combine what you learned so far about model-based imputation to impute different types of variables in the tao data.

Your task is to iterate over variables just like you have done in the previous chapter and impute two variables:

  • is_hot, a new binary variable that was created out of air_temp, which is 1 if air_temp is at or above 26 degrees and is 0 otherwise;
  • humidity, a continuous variable you are already familiar with.

You will have to use the linear regression function you have learned before, as well as your own function for logistic regression. Let's get to it!

This exercise is part of the course

Handling Missing Data with Imputations in R

View Course

Exercise instructions

  • Set is_hot to NA in places where it was originally missing.
  • Impute is_hot with logistic regression, using sea_surface_temp as the only predictor; use your function impute_logreg().
  • Set humidity to NA in places where it was originally missing.
  • Impute humidity with linear regression, using sea_surface_temp and air_temp as predictors.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Initialize missing values with hot-deck
tao_imp <- hotdeck(tao)

# Create boolean masks for where is_hot and humidity are missing
missing_is_hot <- tao_imp$is_hot_imp
missing_humidity <- tao_imp$humidity_imp

for (i in 1:3) {
  # Set is_hot to NA in places where it was originally missing and re-impute it
  ___ <- NA
  tao_imp <- ___(tao_imp, ___ ~ ___)
  # Set humidity to NA in places where it was originally missing and re-impute it
  ___ <- NA
  tao_imp <- ___(tao_imp, ___ ~ sea_surface_temp + ___)
}