Creating a factor variable
We can create a categorical variable from a continuous one. There are many ways to to do that. Let's choose the variable crim
(per capita crime rate by town) to be our factor variable. We want to cut the variable by quantiles to get the high, low and middle rates of crime into their own categories.
See how it's done below!
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Look at the summary of the scaled variable
crim
- Use the function
quantile()
on the scaled crime rate variable and save the results tobins
. Print the results. - Create categorical crime vector with the
cut()
function. Set thebreaks
argument to be the quantile vector you just created. - Use the function
table()
on thecrime
object - Adjust the code of
cut()
by adding thelabel
argument in the function. Create a string vector with the values"low"
,"med_low"
,"med_high"
,"high"
(in that order) and use it to set the labels. - Do the table of the
crime
object again - Execute the last lines of code to remove the original crime rate variable and adding the new one to scaled Boston dataset.
- NOTE! If you receive an error message regarding factors while submitting and you feel your solution is correct, try pressing the submit-button again without altering the code. This usually works. We are currently working on the problem.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# MASS, Boston and boston_scaled are available
# summary of the scaled crime rate
# create a quantile vector of crim and print it
bins <- quantile(boston_scaled$crim)
bins
# create a categorical variable 'crime'
crime <- cut(boston_scaled$crim, breaks = "change me!", include.lowest = TRUE)
# look at the table of the new factor crime
# remove original crim from the dataset
boston_scaled <- dplyr::select(boston_scaled, -crim)
# add the new categorical value to scaled data
boston_scaled <- data.frame(boston_scaled, crime)