ComeçarComece de graça

Logistic regression imputation

A popular choice for imputing binary variables is logistic regression. Unfortunately, there is no function similar to impute_lm() that would do it. That's why you'll write such a function yourself!

Let's call the function impute_logreg(). Its first argument will be a data frame df, whose missing values have been initialized and only containing missing values in the column to be imputed. The second argument will be a formula for the logistic regression model.

The function will do the following:

  • Keep the locations of missing values.
  • Build the model.
  • Make predictions.
  • Replace missing values with predictions.

Don't worry about the line creating imp_var - this is just a way to extract the name of the column to impute from the formula. Let's do some functional programming!

Este exercício faz parte do curso

Handling Missing Data with Imputations in R

Ver curso

Instruções do exercício

  • Create a boolean mask for where df[imp_var] is missing and assign it to missing_imp_var.
  • Fit a logistic regression model using the formula and data that the function will get as arguments, while remembering to set the correct family to ensure a logistic regression is fit (pass it without quotation marks); assign the model to logreg_model.
  • Predict the response with the model and assign it to preds; remember to set the appropriate prediction type.
  • Use preds alongside missing_imp_var to impute missing values.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

impute_logreg <- function(df, formula) {
  # Extract name of response variable
  imp_var <- as.character(formula[2])
  # Save locations where the response is missing
  missing_imp_var <- ___
  # Fit logistic regression mode
  logreg_model <- ___(___, data = ___, family = ___)
  # Predict the response and convert it to 0s and 1s
  preds <- predict(___, type = ___)
  preds <- ifelse(preds >= 0.5, 1, 0)
  # Impute missing values with predictions
  df[missing_imp_var, imp_var] <-___[___]
  return(df)
}
Editar e executar o código