1. Case study introduction
In this course, you’ve learned a lot about working with qualitative variables. We’ve gone from covering how to identify and count the levels to how to reorder, rename, and collapse them. Along the way, we’ve also covered some other tidyverse functions like dplyr's case_when() and tidyr's pivot_longer().
2. FiveThirtyEight graph
In this chapter, you’ll use these recently gained skills to recreate this 538 graph. We first used the data behind this graph in Chapter 2. As discussed there, this data comes from a survey conducted by 538 of 1,040 people on their opinions about flying.
3. Original dataset
Here’s what the original dataset looks like. We have 27 columns, which includes a unique identifier for each person and the 26 questions in the survey. For this chapter, we’ll only need the columns that asked people about rude behavior and the one that asked whether the respondent had flown before so we can eliminate people who haven’t. Before we can start graphing, however, we’ll need to do some data tidying to get it in the right format.
4. Tools recap
Let’s quickly recap some of the tools we’ve learned that can help us tidy up our data in the exercises.
The first three are: is dot character, as dot factor, and mutate with across and where. is dot character lets us check if a variable is a character (returning TRUE or FALSE), and as dot factor changes a variable to a factor. We combine with mutate(), across(), and where(), which lets us change columns where a condition is met. These are helpful tools when we need need to change columns from characters to factors.
5. tidyr pivot_longer()
When working with data, we sometimes find that switching the data from "wide" to "long" format, or vice versa, makes our analysis easier. tidyr's functions pivot_longer() and pivot_wider() help us do that. With pivot_longer(), we can take a wide dataset and transform it into two columns; the first with the column names and the second with row values.
6. Select helper functions
Finally, when we're selecting columns from our dataset, we can take advantage of some of dplyr's helper functions. One of these is the function contains(), which lets us select all column names that contain a string. The string can be a word or it can be just a few letters. For example, we can select only the columns from our wide dataset that contain the string “favorite”.
7. Let's practice!
Time to put this into practice.