Creating a factor variable

We can create a categorical variable from a continuous one. There are many ways to to do that. Let's choose the variable crim (per capita crime rate by town) to be our factor variable. We want to cut the variable by quantiles to get the high, low and middle rates of crime into their own categories.

See how it's done below!

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

Look at the summary of the scaled variable crim
Use the function quantile() on the scaled crime rate variable and save the results to bins. Print the results.
Create categorical crime vector with the cut() function. Set the breaks argument to be the quantile vector you just created.
Use the function table() on the crime object
Adjust the code of cut() by adding the label argument in the function. Create a string vector with the values "low", "med_low", "med_high", "high" (in that order) and use it to set the labels.
Do the table of the crime object again
Execute the last lines of code to remove the original crime rate variable and adding the new one to scaled Boston dataset.
NOTE! If you receive an error message regarding factors while submitting and you feel your solution is correct, try pressing the submit-button again without altering the code. This usually works. We are currently working on the problem.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# MASS, Boston and boston_scaled are available

# summary of the scaled crime rate


# create a quantile vector of crim and print it
bins <- quantile(boston_scaled$crim)
bins

# create a categorical variable 'crime'
crime <- cut(boston_scaled$crim, breaks = "change me!", include.lowest = TRUE)

# look at the table of the new factor crime


# remove original crim from the dataset
boston_scaled <- dplyr::select(boston_scaled, -crim)

# add the new categorical value to scaled data
boston_scaled <- data.frame(boston_scaled, crime)

Edit and Run Code