Get startedGet started for free

Coding categorical features

Sometimes a dataset contains numeric values that represent a categorical feature.

In the donors dataset, wealth_rating uses numbers to indicate the donor's wealth level:

  • 0 = Unknown
  • 1 = Low
  • 2 = Medium
  • 3 = High

This exercise illustrates how to prepare this type of categorical feature and examines its impact on a logistic regression model. The donors data frame is available for you to use.

This exercise is part of the course

Supervised Learning in R: Classification

View Course

Exercise instructions

  • Create a factor wealth_levels from the numeric wealth_rating with labels as shown by passing the factor() function the column you want to convert, the individual levels, and the labels.
  • Use relevel() to change the reference category to Medium. The first argument should be your new factor column.
  • Build a logistic regression model using the column wealth_levels to predict donated and display the result with summary().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Convert the wealth rating to a factor
donors$wealth_levels <- ___(___, levels = ___, labels = ___)

# Use relevel() to change reference category
donors$wealth_levels <- ___(___, ref = ___)

# See how our factor coding impacts the model
summary(___)
Edit and Run Code