Drawing from conditional distribution

Simply calling predict() on a model will always return the same value for the same values of the predictors. This results in a small variability in imputed data. In order to increase it, so that the imputation replicates the variability from the original data, we can draw from the conditional distribution. What this means is that instead of always predicting 1 whenever the model outputs a probability larger than 0.5, we can draw the prediction from a binomial distribution described by the probability returned by the model.

You will work on the code you have written in the previous exercise. The following line was removed:

  preds <- ifelse(preds >= 0.5, 1, 0)

Your task is to fill its place with drawing from a binomial distribution. That's just one line of code!

This exercise is part of the course

Handling Missing Data with Imputations in R

View Course

Exercise instructions

  • Overwrite preds by sampling from a binomial distribution.
  • Pass the length of preds as the first argument.
  • Set size to 1.
  • Set prob to the probabilities returned by the model.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

 impute_logreg <- function(df, formula) {
  # Extract name of response variable
  imp_var <- as.character(formula[2])
  # Save locations where the response is missing
  missing_imp_var <- is.na(df[imp_var])
  # Fit logistic regression mode
  logreg_model <- glm(formula, data = df, family = binomial)
  # Predict the response
  preds <- predict(logreg_model, type = "response")
  # Sample the predictions from binomial distribution
  preds <- ___(___, size = ___, prob = ___)
  # Impute missing values with predictions
  df[missing_imp_var, imp_var] <- preds[missing_imp_var]
  return(df)
}