Exercise

Logistic regression imputation

A popular choice for imputing binary variables is logistic regression. Unfortunately, there is no function similar to impute_lm() that would do it. That's why you'll write such a function yourself!

Let's call the function impute_logreg(). Its first argument will be a data frame df, whose missing values have been initialized and only containing missing values in the column to be imputed. The second argument will be a formula for the logistic regression model.

The function will do the following:

  • Keep the locations of missing values.
  • Build the model.
  • Make predictions.
  • Replace missing values with predictions.

Don't worry about the line creating imp_var - this is just a way to extract the name of the column to impute from the formula. Let's do some functional programming!

Instructions

100 XP
  • Create a boolean mask for where df[imp_var] is missing and assign it to missing_imp_var.
  • Fit a logistic regression model using the formula and data that the function will get as arguments, while remembering to set the correct family to ensure a logistic regression is fit (pass it without quotation marks); assign the model to logreg_model.
  • Predict the response with the model and assign it to preds; remember to set the appropriate prediction type.
  • Use preds alongside missing_imp_var to impute missing values.