Drawing from conditional distribution
Simply calling predict()
on a model will always return the same value for the same values of the predictors. This results in a small variability in imputed data. In order to increase it, so that the imputation replicates the variability from the original data, we can draw from the conditional distribution. What this means is that instead of always predicting 1 whenever the model outputs a probability larger than 0.5, we can draw the prediction from a binomial distribution described by the probability returned by the model.
You will work on the code you have written in the previous exercise. The following line was removed:
preds <- ifelse(preds >= 0.5, 1, 0)
Your task is to fill its place with drawing from a binomial distribution. That's just one line of code!
This exercise is part of the course
Handling Missing Data with Imputations in R
Exercise instructions
- Overwrite
preds
by sampling from a binomial distribution. - Pass the length of
preds
as the first argument. - Set size to 1.
- Set
prob
to the probabilities returned by the model.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
impute_logreg <- function(df, formula) {
# Extract name of response variable
imp_var <- as.character(formula[2])
# Save locations where the response is missing
missing_imp_var <- is.na(df[imp_var])
# Fit logistic regression mode
logreg_model <- glm(formula, data = df, family = binomial)
# Predict the response
preds <- predict(logreg_model, type = "response")
# Sample the predictions from binomial distribution
preds <- ___(___, size = ___, prob = ___)
# Impute missing values with predictions
df[missing_imp_var, imp_var] <- preds[missing_imp_var]
return(df)
}