Get startedGet started for free

The group_by verb

1. The group_by verb

In the last set of exercises, you learned to use the

2. The summarize verb

summarize verb to answer questions about the entire dataset, or about a particular year. For example, here you're finding the average life expectancy and the total population in the year 2007. What if we weren't interested just in the average for the year 2007, but for each of the years in the dataset? You could rerun this code and change the year each time, but that's very tedious. Instead, you can use the group_by verb, which tells dplyr to summarize within groups instead of summarizing the entire dataset.

3. Summarizing by year

Notice that this replaces the filter year equals 2007 with group_by year. group_by(year) tells the summarize step that it should perform the summary within each year: within 1952, then within 1957, then within 1962, and combine the results. Instead of getting one row overall, you now get one row for each year. There's now a year variable along with the new meanLifeExp and totalPop variables. This shows us that the total population started at 2-point-4 billion, and went up to 6-point-25 billion in 2007. We can also see that average life expectancy went up from 49 years in 1952 to 67. You can summarize by other variables besides year. Suppose you're

4. Summarizing by continent

interested in the average life expectancy and the total population in 2007 within each continent. You can find this by first filtering for the year 2007, grouping by continent (instead of year), and then performing your summary. This results in a table with one row for each continent, with columns for mean life expectancy and total population. We can see that Europe and Oceania have the highest life expectancy, and that Asia and Africa are lower. Now that you've calculated these statistics for each continent in 2007, you might be interested in how they changed for each continent over time.

5. Summarizing by continent and year

To do so, you can summarize by both year and continent, by adding year comma continent within the group by. Now the output has one row for each combination of a year and continent. For example, we see the total population and average life expectancy in 1952 for Africa, the Americas, Asia, Europe, and Oceania, followed by each of the continent-level summaries for 1957. In the next video, you'll learn how to visualize this per-year, per-continent data to understand trends over time.

6. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.