1. Selecting
This chapter focuses on advanced methods of selecting and transforming columns. There are a ton of variables in the counties dataset, and we often only want to work with a subset of them.
2. Select
We've already seen that we can select the columns that we're interested in by listing them inside the select verb.
3. Select a range
We can also select a range of columns. For example, there are a series of columns containing information on how people get to work. If we wanted to select this range of columns from drive to work_at_home, we can use drive colon work_at_home.
4. Select and arrange
By selecting and arranging these columns, we find that the counties where the fewest people drive to work are located in Alaska and New York. We can see other interesting insights because we've focused in on this part of the data: in New York people mostly take transit to work, while in the small parts of Alaska, they mostly walk.
5. Contains
There are also other useful techniques for selecting columns. dplyr provides "select helpers", which are functions that specify criteria for choosing columns. We'll start with the contains function.
To select all columns containing the word, "work", pass contains "work" to the select() function.
Notice that we place work in quotes, unlike state and county; this is because select helpers take strings, which must be specified using quotes.
6. Starts with
We can also use starts_with to select only the columns that start with a particular prefix. We could use this to get all of the columns that begin with the word "income", which are generally related to each other.
Select helpers are great for picking just the parts of a table that are relevant to the question.
7. Other helpers
dplyr provides a number of select helpers besides contains and starts_with, such as ends_with, which finds columns ending in a string, last_col, which grabs the last column, and matches, which selects columns that have a specified pattern.
To discover more about select helpers, check out the dplyr select helpers documentation, using question mark-select_helpers.
8. Removing a variable
Finally, we can use select to remove variables from a table by adding a minus sign in front of the column name to remove. If we decide that the census id for each county is not very helpful, select minus census_id removes just that column.
9. Let's practice!
You've seen that select is a very versatile tool for not only choosing particular columns, but for removing and reordering them. Let's practice!