Session Ready
Exercise

t-test for MAR: data preparation

Great work on classifying the missing data mechanisms in the last exercise! Of all three, MAR is arguably the most important one to detect, as many imputation methods assume the data are MAR. This exercise will, therefore, focus on testing for MAR.

You will be working with the familiar biopics data. The goal is to test whether the number of missing values in earnings differs per subject's gender. In this exercise, you will only prepare the data for the t-test. First, you will create a dummy variable indicating missingness in earnings. Then, you will split it per gender by first filtering the data to keep one of the genders, and then pulling the dummy variable. For filtering, it might be helpful to print biopics's head() in the console and examine the gender variable.

Instructions
100 XP
  • Add another variable to biopics called missing_earnings that is TRUE if earnings is missing and FALSE otherwise.
  • Create a vector of missing_earnings values for males and assign it to missing_earnings_males.
  • Create a vector of missing_earnings values for females and assign it to missing_earnings_females.