1. What is a factor?
Do you prefer stocks or bonds? Do you want to do a short-term, medium-term or long-term investment? Notice how these questions have answers that are restricted to a certain number of responses, or levels. The overarching variable with each are known as factors. Factors are useful for things such as linear regression, plotting, and tabling your data when you want to divvy up your calculation by multiple groups.
2. Stocks or bonds?
Looking more closely at the first question, the variable investment would be considered a factor with the levels stocks and bonds. The data you gathered corresponding to the factor, investment, might look like this. Notice how this doesn't look any different than a character vector. Under the hood,
3. Stocks or bonds?
R stores your factor as an integer vector, where bond might correspond to a 1, and stock might correspond to a 2, but on the surface,
4. Stocks or bonds?
you see the names of each level. This helps you interpret the results, but allows R to perform numeric calculations on the vector levels.
5. Factor creator
But how do you create a factor in R? Intuitively, you use the factor function. To create a factor for investment, type factor, and pass it the vector of answers that you might have collected from people that prefer stocks vs bonds. Viewing investment looks just like a character vector, except for a new line corresponding to Levels, which shows you the unique levels in your factor. To confirm that investment is indeed a factor, you can use the class function, which returns factor. To see that a factor is truly an integer vector under the hood,
6. Factor creator
you can use the general function as-dot-integer to convert your factor to an integer. If you need to access the individual levels in investment, you can do so with the function levels.
7. cut() it up
Sometimes, you will have to create a factor from a vector that would normally be numeric. Imagine that you had a ranking system for stocks, from 1-50, with 50 being the best, and 1 being the worst. Using head, we can see the first 6 observations of ranking. Technically, you could create a factor with 50 levels from this, but it might be more informative to group your stocks a different way. Why not create 5 buckets, 1-10, 11-20, 21-30, 31-40, and 41-50 that each stock ranking will fall in? In R, you can do this with the cut function. You cut the ranking, and define the breaks for your buckets like so. You can check each of the first 6 observations and see that, for example, 36 has been correctly placed in the 30-40 group. Notice that while our first group was technically from 1-10, we defined the break as 0-10. This is because R treats the left side of each bucket as exclusive, denoted by the parenthesis, and the right side of each bucket as inclusive, denoted by the bracket. Here, we are excluding 0, meaning we start with 1, and we are including 10, giving us the correct bucket of 1-10. It might seem strange at first, but after an example or two you will get used to the idea.
8. Let's practice!
Factors are used all throughout R, so we've prepared some great examples in the next exercises to get you used to creating them. Have fun!