Handling missing data
Some of the prospective donors have missing age data. Unfortunately, R will exclude any cases with NA values when building a regression model.
One workaround is to replace, or impute, the missing values with an estimated value. After doing so, you may also create a missing data indicator to model the possibility that cases with missing data are different in some way from those without.
The data frame donors is loaded in your workspace.
Bu egzersiz, kursun bir parçasıdır
Supervised Learning in R: Classification
Egzersiz talimatları
- Use
summary()ondonors$ageto find the average age of prospects with non-missing data. - Use
ifelse()and the testis.na(donors$age)to impute the average (rounded to 2 decimal places) for cases with missingage. Be sure to also ignoreNAs. - Create a binary dummy variable named
missing_ageindicating the presence of missing data using anotherifelse()call and the same test.
Uygulamalı etkileşimli egzersiz
Bu egzersizi bu örnek kodu tamamlayarak deneyin.
# Find the average age among non-missing values
summary(___)
# Impute missing age values with the mean age
donors$imputed_age <- ifelse(___)
# Create missing value indicator for age
donors$missing_age <- ___