LoslegenKostenlos loslegen

Looking for Predictable Missingness

If data are missing completely at random, then you shouldn't be able to predict when a variable is missing based on the rest of the data. Therefore, if you can predict missingness then the data are not missing completely at random. So, let's use the glm() function to fit a logistic regression, looking for missingness based on affordability in the mort variable you created earlier. If you don't find any structure in the missing data - i.e., the slope variables are not significant - it does not mean that you have proven the data are missing at random, but it is plausible.

Diese Übung ist Teil des Kurses

Scalable Data Processing in R

Kurs anzeigen

Anleitung zur Übung

  • Create a variable indicating if the "borrower_race" is missing (equal to 9) in the mortgage data.
  • Create a factor variable of the "affordability" column.
  • Regress affordability_factor on borrower_race_ind and call summary() on it.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Create a variable indicating if borrower_race is missing in the mortgage data
borrower_race_ind <- mort[, ___] == 9

# Create a factor variable indicating the affordability
affordability_factor <- ___(mort[, ___])

# Perform a logistic regression
___(glm(___ ~ affordability_factor, family = binomial))
Code bearbeiten und ausführen