Get startedGet started for free

The filter verb

1. The filter verb

Now that you've been introduced to the gapminder data, you'll learn the tools to work with it. In the rest of this chapter you'll learn about the "verbs" in the dplyr package - these are the atomic steps you use to transform data. The first verb you'll use is filter.

2. The filter verb

You use filter when you want to look only at a subset of your observations, based on a particular condition. Filtering data is a common first step in an analysis. Every time you apply a verb, you'll use a pipe.

3. Filtering for one year

A pipe is a percent, greater than, percent. It says "take whatever is before it, and feed it into the next step." After the pipe, we can perform our first verb. We have data on many years, but we'd like to filter for just one. Let's say we filter for 2007, the most recent data in the dataset. The "year equals equals 2007" is the condition we are using to filter observations. The "equals equals" may be surprising: it's what we call a "logical equals"- an operation to compare two values: each year, and the number 2007. A single equals here would mean something different in R, which you'll see later. Here, we're saying we want to filter for only the observations from 2007. Let's see what this code outputs. Notice that now, we have only 142 rows: that's how many countries are in the dataset. It's important to note that you're not removing any rows from the original gapminder data. You can still use the gapminder object for other analyses, and it won't be any different than it was before. Instead, filter is returning a new dataset, one with fewer rows, that then gets printed to the screen. You could choose another condition to filter on,

4. Filtering for one country

besides the year. For example, suppose we wanted to get only the observations from the United States. We would write this as "filter country equals equals quote United States endquote", resulting in only the 12 observations from that country. The quotes around United States are important: otherwise R won't understand that the words "United" and "States" are the content of a text variable, as opposed to variable names. You didn't need quotes around a number like 2007, but you do around text. Finally,

5. Filtering for two variables

we can specify multiple conditions in the filter. Each of the conditions is separated by a comma: here we are saying we want only the one observation for the year 2007, comma, where the country is the United States. Each of these equals equals expressions is called an argument. This kind of double filter is useful for extracting a single observation you're interested in. You'll be able to practice this in the exercises.

6. Let's practice!