IniziaInizia gratis

Handling missing data

Some of the prospective donors have missing age data. Unfortunately, R will exclude any cases with NA values when building a regression model.

One workaround is to replace, or impute, the missing values with an estimated value. After doing so, you may also create a missing data indicator to model the possibility that cases with missing data are different in some way from those without.

The data frame donors is loaded in your workspace.

Questo esercizio fa parte del corso

Supervised Learning in R: Classification

Visualizza il corso

Istruzioni dell'esercizio

  • Use summary() on donors$age to find the average age of prospects with non-missing data.
  • Use ifelse() and the test is.na(donors$age) to impute the average (rounded to 2 decimal places) for cases with missing age. Be sure to also ignore NAs.
  • Create a binary dummy variable named missing_age indicating the presence of missing data using another ifelse() call and the same test.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Find the average age among non-missing values
summary(___)

# Impute missing age values with the mean age
donors$imputed_age <- ifelse(___)

# Create missing value indicator for age
donors$missing_age <- ___
Modifica ed esegui il codice