Logistic regression imputation
A popular choice for imputing binary variables is logistic regression. Unfortunately, there is no function similar to impute_lm() that would do it. That's why you'll write such a function yourself!
Let's call the function impute_logreg(). Its first argument will be a data frame df, whose missing values have been initialized and only containing missing values in the column to be imputed. The second argument will be a formula for the logistic regression model.
The function will do the following:
- Keep the locations of missing values.
- Build the model.
- Make predictions.
- Replace missing values with predictions.
Don't worry about the line creating imp_var - this is just a way to extract the name of the column to impute from the formula. Let's do some functional programming!
Cet exercice fait partie du cours
Handling Missing Data with Imputations in R
Instructions
- Create a boolean mask for where
df[imp_var]is missing and assign it tomissing_imp_var. - Fit a logistic regression model using the formula and data that the function will get as arguments, while remembering to set the correct
familyto ensure a logistic regression is fit (pass it without quotation marks); assign the model tologreg_model. - Predict the response with the model and assign it to
preds; remember to set the appropriate predictiontype. - Use
predsalongsidemissing_imp_varto impute missing values.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
impute_logreg <- function(df, formula) {
# Extract name of response variable
imp_var <- as.character(formula[2])
# Save locations where the response is missing
missing_imp_var <- ___
# Fit logistic regression mode
logreg_model <- ___(___, data = ___, family = ___)
# Predict the response and convert it to 0s and 1s
preds <- predict(___, type = ___)
preds <- ifelse(preds >= 0.5, 1, 0)
# Impute missing values with predictions
df[missing_imp_var, imp_var] <-___[___]
return(df)
}