Exercise

# Logistic regression imputation

A popular choice for imputing binary variables is logistic regression. Unfortunately, there is no function similar to `impute_lm()`

that would do it. That's why you'll write such a function yourself!

Let's call the function `impute_logreg()`

. Its first argument will be a data frame `df`

, whose missing values have been initialized and only containing missing values in the column to be imputed. The second argument will be a `formula`

for the logistic regression model.

The function will do the following:

- Keep the locations of missing values.
- Build the model.
- Make predictions.
- Replace missing values with predictions.

Don't worry about the line creating `imp_var`

- this is just a way to extract the name of the column to impute from the formula. Let's do some functional programming!

Instructions

**100 XP**

- Create a boolean mask for where
`df[imp_var]`

is missing and assign it to`missing_imp_var`

. - Fit a logistic regression model using the formula and data that the function will get as arguments, while remembering to set the correct
`family`

to ensure a logistic regression is fit (pass it without quotation marks); assign the model to`logreg_model`

. - Predict the response with the model and assign it to
`preds`

; remember to set the appropriate prediction`type`

. - Use
`preds`

alongside`missing_imp_var`

to impute missing values.