Coding categorical features
Sometimes a dataset contains numeric values that represent a categorical feature.
In the donors dataset, wealth_rating uses numbers to indicate the donor's wealth level:
- 0 = Unknown
- 1 = Low
- 2 = Medium
- 3 = High
This exercise illustrates how to prepare this type of categorical feature and examines its impact on a logistic regression model. The donors data frame is available for you to use.
This exercise is part of the course
Supervised Learning in R: Classification
Exercise instructions
- Create a factor
wealth_levelsfrom the numericwealth_ratingwith labels as shown by passing thefactor()function the column you want to convert, the individual levels, and the labels. - Use
relevel()to change the reference category toMedium. The first argument should be your newfactorcolumn. - Build a logistic regression model using the column
wealth_levelsto predictdonatedand display the result withsummary().
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Convert the wealth rating to a factor
donors$wealth_levels <- ___(___, levels = ___, labels = ___)
# Use relevel() to change reference category
donors$wealth_levels <- ___(___, ref = ___)
# See how our factor coding impacts the model
summary(___)