The mice flow: mice - with - pool
Multiple imputation by chained equations, or MICE, allows us to estimate the uncertainty from imputation by imputing a data set multiple times with model-based imputation, while drawing from conditional distributions. This way, each imputed data set is slightly different. Then, an analysis is conducted on each of them and the results are pooled together, yielding the quantities of interest, alongside their confidence intervals that reflect the imputation uncertainty.
In this exercise, you will practice the typical MICE flow: mice()
- with()
- pool()
. You will perform a regression analysis on the biopics
data to see which subject occupation, sub_type
, is associated with highest movie earnings. Let's play with mice!
This exercise is part of the course
Handling Missing Data with Imputations in R
Exercise instructions
- Load the
mice
package and imputebiopics
withmice()
using 5 imputations, assigning the result tobiopics_multiimp
. - Fit a linear regression model that explains
earnings
usingyear
andsub_type
to each imputed data set, assigning the result tolm_multiimp
. - Pool the regression models saved in
lm_multiimp
together, assigning the result tolm_pooled
. - Summarize
lm_pooled
such that it produces confidence intervals with a 95% confidence level.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load mice package
___
# Impute biopics with mice using 5 imputations
biopics_multiimp <- ___(___, m = ___, seed = 3108)
# Fit linear regression to each imputed data set
lm_multiimp <- ___(___, ___)
# Pool and summarize regression results
lm_pooled <- ___(___)
___(___, conf.int = ___, conf.level = ___)