Model-based imputation with multiple variable types
Great job on writing the function to implement logistic regression imputation with drawing from conditional distribution. That's pretty advanced statistics you have coded! In this exercise, you will combine what you learned so far about model-based imputation to impute different types of variables in the tao data.
Your task is to iterate over variables just like you have done in the previous chapter and impute two variables:
is_hot, a new binary variable that was created out ofair_temp, which is 1 ifair_tempis at or above 26 degrees and is 0 otherwise;humidity, a continuous variable you are already familiar with.
You will have to use the linear regression function you have learned before, as well as your own function for logistic regression. Let's get to it!
Cet exercice fait partie du cours
Handling Missing Data with Imputations in R
Instructions
- Set
is_hottoNAin places where it was originally missing. - Impute
is_hotwith logistic regression, usingsea_surface_tempas the only predictor; use your functionimpute_logreg(). - Set
humiditytoNAin places where it was originally missing. - Impute
humiditywith linear regression, usingsea_surface_tempandair_tempas predictors.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Initialize missing values with hot-deck
tao_imp <- hotdeck(tao)
# Create boolean masks for where is_hot and humidity are missing
missing_is_hot <- tao_imp$is_hot_imp
missing_humidity <- tao_imp$humidity_imp
for (i in 1:3) {
# Set is_hot to NA in places where it was originally missing and re-impute it
___ <- NA
tao_imp <- ___(tao_imp, ___ ~ ___)
# Set humidity to NA in places where it was originally missing and re-impute it
___ <- NA
tao_imp <- ___(tao_imp, ___ ~ sea_surface_temp + ___)
}