Bucketing a numeric variable into a factor
Your old friend Dan sent you a list of 50 AAA rated bonds called AAA_rank
, with each bond having an additional number from 1-100 describing how profitable he thinks that bond will be (100 being the most profitable). You are interested in doing further analysis on his suggestions, but first it would be nice if the bonds were bucketed by their ranking somehow. This would help you create groups of bonds, from least profitable to most profitable, to more easily analyze them.
This is a great example of creating a factor from a numeric vector. The easiest way to do this is to use cut()
. Below, Dan's 1-100 ranking is bucketed into 5 evenly spaced groups. Note that the (
in the factor levels means we do not include the number beside it in that group, and the ]
means that we do include that number in the group.
head(AAA_rank)
[1] 31 48 100 53 85 73
AAA_factor <- cut(x = AAA_rank, breaks = c(0, 20, 40, 60, 80, 100))
head(AAA_factor)
[1] (20,40] (40,60] (80,100] (40,60] (80,100] (60,80]
Levels: (0,20] (20,40] (40,60] (60,80] (80,100]
In the cut()
function, using breaks =
allows you to specify the groups that you want R to bucket your data by!
This is a part of the course
“Introduction to R for Finance”
Exercise instructions
- Instead of 5 buckets, can you create just 4? In
breaks =
use a vector from 0 to 100 where each element is 25 numbers apart. Assign it toAAA_factor
. - The 4 buckets do not have very descriptive names. Use
levels()
to rename the levels to"low"
,"medium"
,"high"
, and"very_high"
, in that order. - Print the newly named
AAA_factor
. - Plot the
AAA_factor
to visualize your work!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create 4 buckets for AAA_rank using cut()
AAA_factor <- cut(x = ___, breaks = ___)
# Rename the levels
# Print AAA_factor
# Plot AAA_factor
plot(___)
This exercise is part of the course
Introduction to R for Finance
Learn essential data structures such as lists and data frames and apply that knowledge directly to financial examples.
Questions with answers that fall into a limited number of categories can be classified as factors. In this chapter, you will use bond credit ratings to learn all about creating, ordering, and subsetting factors.
Exercise 1: What is a factor?Exercise 2: Create a factorExercise 3: Factor levelsExercise 4: Factor summaryExercise 5: Visualize your factorExercise 6: Bucketing a numeric variable into a factorExercise 7: Ordering and subsetting factorsExercise 8: Create an ordered factorExercise 9: Subsetting a factorWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.