Get startedGet started for free

t-test for MAR: data preparation

Great work on classifying the missing data mechanisms in the last exercise! Of all three, MAR is arguably the most important one to detect, as many imputation methods assume the data are MAR. This exercise will, therefore, focus on testing for MAR.

You will be working with the familiar biopics data. The goal is to test whether the number of missing values in earnings differs per subject's gender. In this exercise, you will only prepare the data for the t-test. First, you will create a dummy variable indicating missingness in earnings. Then, you will split it per gender by first filtering the data to keep one of the genders, and then pulling the dummy variable. For filtering, it might be helpful to print biopics's head() in the console and examine the gender variable.

This exercise is part of the course

Handling Missing Data with Imputations in R

View Course

Exercise instructions

  • Add another variable to biopics called missing_earnings that is TRUE if earnings is missing and FALSE otherwise.
  • Create a vector of missing_earnings values for males and assign it to missing_earnings_males.
  • Create a vector of missing_earnings values for females and assign it to missing_earnings_females.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a dummy variable for missing earnings
biopics <- biopics %>% 
  ___(missing_earnings = ___(___))

# Pull the missing earnings dummy for males
missing_earnings_males <- biopics %>% 
  ___(___) %>% 
  ___(___)

# Pull the missing earnings dummy for females
missing_earnings_females <- biopics %>% 
  ___(___) %>% 
  ___(___)
Edit and Run Code