CommencerCommencer gratuitement

Drawing from conditional distribution

Simply calling predict() on a model will always return the same value for the same values of the predictors. This results in a small variability in imputed data. In order to increase it, so that the imputation replicates the variability from the original data, we can draw from the conditional distribution. What this means is that instead of always predicting 1 whenever the model outputs a probability larger than 0.5, we can draw the prediction from a binomial distribution described by the probability returned by the model.

You will work on the code you have written in the previous exercise. The following line was removed:

  preds <- ifelse(preds >= 0.5, 1, 0)

Your task is to fill its place with drawing from a binomial distribution. That's just one line of code!

Cet exercice fait partie du cours

Handling Missing Data with Imputations in R

Afficher le cours

Instructions

  • Overwrite preds by sampling from a binomial distribution.
  • Pass the length of preds as the first argument.
  • Set size to 1.
  • Set prob to the probabilities returned by the model.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

 impute_logreg <- function(df, formula) {
  # Extract name of response variable
  imp_var <- as.character(formula[2])
  # Save locations where the response is missing
  missing_imp_var <- is.na(df[imp_var])
  # Fit logistic regression mode
  logreg_model <- glm(formula, data = df, family = binomial)
  # Predict the response
  preds <- predict(logreg_model, type = "response")
  # Sample the predictions from binomial distribution
  preds <- ___(___, size = ___, prob = ___)
  # Impute missing values with predictions
  df[missing_imp_var, imp_var] <- preds[missing_imp_var]
  return(df)
}
Modifier et exécuter le code