Coding categorical features
Sometimes a dataset contains numeric values that represent a categorical feature.
In the donors
dataset, wealth_rating
uses numbers to indicate the donor's wealth level:
- 0 = Unknown
- 1 = Low
- 2 = Medium
- 3 = High
This exercise illustrates how to prepare this type of categorical feature and examines its impact on a logistic regression model. The donors
data frame is available for you to use.
This exercise is part of the course
Supervised Learning in R: Classification
Exercise instructions
- Create a factor
wealth_levels
from the numericwealth_rating
with labels as shown by passing thefactor()
function the column you want to convert, the individual levels, and the labels. - Use
relevel()
to change the reference category toMedium
. The first argument should be your newfactor
column. - Build a logistic regression model using the column
wealth_levels
to predictdonated
and display the result withsummary()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Convert the wealth rating to a factor
donors$wealth_levels <- ___(___, levels = ___, labels = ___)
# Use relevel() to change reference category
donors$wealth_levels <- ___(___, ref = ___)
# See how our factor coding impacts the model
summary(___)