1. Ordering and subsetting factors
Great job! Now, let's talk about some of the more subtle things you can do with factors.
2. Order it
One of the interesting things that you do to gain more control over your factor is order it. By default, R will order a factor that you have created from a character vector in alphabetical order, this is why stocks was assigned the value of 2 in the last video and bonds was assigned 1, but sometimes this isn't what you want. Sometimes, your data takes on values that have an implied order, like low, medium and high. This type of data is called ordinal. When creating graphs and tables with this type of data, it is often important to have the correct order defined explicitly. In the exercises, you will explore this idea with the credit ratings data set, which has a defined order from lowest rating, to highest, but for now, let's assume that these low, medium, high categories are your way of ranking the potential profit of a certain stock.
3. How to order?
From our knowledge about alphabetical order, we know that R has an internal order for our rank of high, low, medium. To see this more explicitly, we can use the ordered function, and let R create an ordered factor for us using that alphabetical order. We see that levels now has an additional component expressing that high is before low, which is before medium. That’s not right!
4. How to order?
To define the order yourself, you can add the levels argument when calling ordered. By passing levels a character vector in an order that we specify, R will create an ordered factor with the correct order of low, medium, high.
5. Factor subsets
Subsetting a factor works as you might expect, for the most part. If you just wanted to access the low answers in your factor, you could use the brackets as done on the slide. The one thing to note is that even though there are no medium or high answers left in your factor, the levels for them still stick around. This could be unwanted when creating tables using summary.
6. Factor subsets
To make sure that R drops the levels when you subset, just set the drop argument inside of the brackets to TRUE.
7. Let's practice!
Great! Now it's time to practice what you have learned about ordered factors on the credit ratings data set. Good luck!