Model-based imputation with multiple variable types
Great job on writing the function to implement logistic regression imputation with drawing from conditional distribution. That's pretty advanced statistics you have coded! In this exercise, you will combine what you learned so far about model-based imputation to impute different types of variables in the tao
data.
Your task is to iterate over variables just like you have done in the previous chapter and impute two variables:
is_hot
, a new binary variable that was created out ofair_temp
, which is 1 ifair_temp
is at or above 26 degrees and is 0 otherwise;humidity
, a continuous variable you are already familiar with.
You will have to use the linear regression function you have learned before, as well as your own function for logistic regression. Let's get to it!
This exercise is part of the course
Handling Missing Data with Imputations in R
Exercise instructions
- Set
is_hot
toNA
in places where it was originally missing. - Impute
is_hot
with logistic regression, usingsea_surface_temp
as the only predictor; use your functionimpute_logreg()
. - Set
humidity
toNA
in places where it was originally missing. - Impute
humidity
with linear regression, usingsea_surface_temp
andair_temp
as predictors.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize missing values with hot-deck
tao_imp <- hotdeck(tao)
# Create boolean masks for where is_hot and humidity are missing
missing_is_hot <- tao_imp$is_hot_imp
missing_humidity <- tao_imp$humidity_imp
for (i in 1:3) {
# Set is_hot to NA in places where it was originally missing and re-impute it
___ <- NA
tao_imp <- ___(tao_imp, ___ ~ ___)
# Set humidity to NA in places where it was originally missing and re-impute it
___ <- NA
tao_imp <- ___(tao_imp, ___ ~ sea_surface_temp + ___)
}