Aan de slagGa gratis aan de slag

Logistic regression imputation

A popular choice for imputing binary variables is logistic regression. Unfortunately, there is no function similar to impute_lm() that would do it. That's why you'll write such a function yourself!

Let's call the function impute_logreg(). Its first argument will be a data frame df, whose missing values have been initialized and only containing missing values in the column to be imputed. The second argument will be a formula for the logistic regression model.

The function will do the following:

  • Keep the locations of missing values.
  • Build the model.
  • Make predictions.
  • Replace missing values with predictions.

Don't worry about the line creating imp_var - this is just a way to extract the name of the column to impute from the formula. Let's do some functional programming!

Deze oefening maakt deel uit van de cursus

Handling Missing Data with Imputations in R

Cursus bekijken

Oefeninstructies

  • Create a boolean mask for where df[imp_var] is missing and assign it to missing_imp_var.
  • Fit a logistic regression model using the formula and data that the function will get as arguments, while remembering to set the correct family to ensure a logistic regression is fit (pass it without quotation marks); assign the model to logreg_model.
  • Predict the response with the model and assign it to preds; remember to set the appropriate prediction type.
  • Use preds alongside missing_imp_var to impute missing values.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

impute_logreg <- function(df, formula) {
  # Extract name of response variable
  imp_var <- as.character(formula[2])
  # Save locations where the response is missing
  missing_imp_var <- ___
  # Fit logistic regression mode
  logreg_model <- ___(___, data = ___, family = ___)
  # Predict the response and convert it to 0s and 1s
  preds <- predict(___, type = ___)
  preds <- ifelse(preds >= 0.5, 1, 0)
  # Impute missing values with predictions
  df[missing_imp_var, imp_var] <-___[___]
  return(df)
}
Code bewerken en uitvoeren