Logistic regression imputation
A popular choice for imputing binary variables is logistic regression. Unfortunately, there is no function similar to impute_lm()
that would do it. That's why you'll write such a function yourself!
Let's call the function impute_logreg()
. Its first argument will be a data frame df
, whose missing values have been initialized and only containing missing values in the column to be imputed. The second argument will be a formula
for the logistic regression model.
The function will do the following:
- Keep the locations of missing values.
- Build the model.
- Make predictions.
- Replace missing values with predictions.
Don't worry about the line creating imp_var
- this is just a way to extract the name of the column to impute from the formula. Let's do some functional programming!
Este exercício faz parte do curso
Handling Missing Data with Imputations in R
Instruções do exercício
- Create a boolean mask for where
df[imp_var]
is missing and assign it tomissing_imp_var
. - Fit a logistic regression model using the formula and data that the function will get as arguments, while remembering to set the correct
family
to ensure a logistic regression is fit (pass it without quotation marks); assign the model tologreg_model
. - Predict the response with the model and assign it to
preds
; remember to set the appropriate predictiontype
. - Use
preds
alongsidemissing_imp_var
to impute missing values.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
impute_logreg <- function(df, formula) {
# Extract name of response variable
imp_var <- as.character(formula[2])
# Save locations where the response is missing
missing_imp_var <- ___
# Fit logistic regression mode
logreg_model <- ___(___, data = ___, family = ___)
# Predict the response and convert it to 0s and 1s
preds <- predict(___, type = ___)
preds <- ifelse(preds >= 0.5, 1, 0)
# Impute missing values with predictions
df[missing_imp_var, imp_var] <-___[___]
return(df)
}