t-test for MAR: data preparation
Great work on classifying the missing data mechanisms in the last exercise! Of all three, MAR is arguably the most important one to detect, as many imputation methods assume the data are MAR. This exercise will, therefore, focus on testing for MAR.
You will be working with the familiar biopics
data. The goal is to test whether the number of missing values in earnings
differs per subject's gender. In this exercise, you will only prepare the data for the t-test. First, you will create a dummy variable indicating missingness in earnings
. Then, you will split it per gender by first filtering the data to keep one of the genders, and then pulling the dummy variable. For filtering, it might be helpful to print biopics
's head()
in the console and examine the gender variable.
This exercise is part of the course
Handling Missing Data with Imputations in R
Exercise instructions
- Add another variable to
biopics
calledmissing_earnings
that isTRUE
ifearnings
is missing andFALSE
otherwise. - Create a vector of
missing_earnings
values for males and assign it tomissing_earnings_males
. - Create a vector of
missing_earnings
values for females and assign it tomissing_earnings_females
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a dummy variable for missing earnings
biopics <- biopics %>%
___(missing_earnings = ___(___))
# Pull the missing earnings dummy for males
missing_earnings_males <- biopics %>%
___(___) %>%
___(___)
# Pull the missing earnings dummy for females
missing_earnings_females <- biopics %>%
___(___) %>%
___(___)