1. The filter and arrange verbs
We've seen how the select verb can be used to select particular variables, or columns, from a dataset. Now we'll explore the data to find interesting observations, or rows.
2. Selecting columns
We start with code for selecting four columns of interest: the state, county, population, and unemployment rate. We could start by assigning that to the variable counties-underscore-selected. We'll work with that variable from now on.
Right now, notice that the observations are in alphabetical order by state and county.
3. arrange()
We might instead be interested in the counties that have the highest population.
The arrange() verb sorts your data based on one or more variables. We start with the counties_selected variable, then the pipe operator to feed the result into arrange. Inside the arrange parentheses, we specify the variable to sort by, population in this case.
This shows us the counties with the lowest population, which are indeed pretty small: one county in Hawaii has only 85 people.
4. desc()
We might be more interested in the counties with the highest population. To find these, we'd change our code only a little bit: wrapping desc() around population.
It looks like the highest population is Los Angeles, California, which is one of the biggest cities in the United States. Arrange is a useful verb for finding the most interesting observations in a dataset.
5. filter()
Another useful verb is filter(). We can use the filter verb to extract only particular observations from a dataset, based on a condition. Recall that after your first verb, we can add a pipe operator, then add another verb. We can pipe any number of verbs together to transform a dataset in a series of steps.
For example, after the arrange(), we could add filter state equals equals quote New York to get only counties in the state of New York.
Notice that the observations are filtered, but they're still sorted by population thanks to arrange.
6. filter()
Besides equals equals, we can filter based on logical operators like less than or greater than. For example, we could filter for counties that have an unemployment rate of less than 6 percent. The condition in the filter would be unemployment less than 6.
This tells us that the largest counties with an unemployment rate below 6 percent are Fairfax, Virginia and Salt Lake, Utah.
7. Combining conditions
Finally, we can combine multiple conditions together in a filter. We've filtered for the state of New York and for unemployment below 6 percent, but we can do both at the same time by separating them with a comma.
It looks like only a few counties in New York have an unemployment rate that low.
8. Let's practice!
Throughout the course, we'll continue to discover new verbs to answer increasingly interesting questions. Let's practice!