Wrapping imputation & modeling in a function
Whenever you perform any analysis or modeling on imputed data, you should account for the uncertainty from imputation. Running a model on a dataset imputed only once ignores the fact that imputation estimates the missing values with uncertainty. Standard errors from such a model tend to be too small. The solution to this is multiple imputation and one way to implement it is by bootstrapping.
In the upcoming exercises, you will work with the familiar biopics data. The goal is to use multiple imputation by bootstrapping and linear regression to see if, based on the data at hand, biographical movies featuring females earn less than those about males.
Let's start with writing a function that creates a bootstrap sample, imputes it, and fits a linear regression model.
Diese Übung ist Teil des Kurses
Handling Missing Data with Imputations in R
Anleitung zur Übung
- Slice
datato resample rows indicated byindicesand assign the result todata_boot. - Impute the bootstrap sample
data_bootwith kNN imputation using 5 neighbors and assign the result todata_imp. - Fit a linear regression model to
data_impthat explainsearningswithsub_sex,sub_typeandyear.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
calc_gender_coef <- function(data, indices) {
# Get bootstrap sample
data_boot <- data[___, ]
# Impute with kNN imputation
data_imp <- ___
# Fit linear regression
linear_model <- ___
# Extract and return gender coefficient
gender_coefficient <- coef(linear_model)[2]
return(gender_coefficient)
}