Visualizing summarized data
1. Visualizing summarized data
In this chapter you learned to use the group by and summarize verbs to summarize the gapminder data by year, by continent, or by both. Now you'll learn how to turn those summaries into informative visualizations, by returning to the ggplot2 package from Chapter 2.2. Summarizing by year
In the last video we summarized data by year, to find the change in population and in mean life expectancy over time. Now instead of viewing the summarized data as a table, let's save it as an object called by_year, so you can visualize the data using ggplot2. You would3. Visualizing population over time
construct the graph with the three steps of ggplot2: the data, which is by_year. The aesthetics, which puts year on the x-axis and total population on the y-axis. And the type of graph, which in this case is a scatter plot, represented by geom_point. Notice that the steps are the same as when you were graphing countries in a scatter plot, even though it's a new dataset. The resulting graph of population by year shows the change in the total population, which is going up over time. ggplot2 puts the y-axis is in scientific notation, since showing it with nine zeros would be hard to read. The global starts a little under 3 times 10 to the 9th power- that's three billion- and goes up to more than 6 billion. You might notice that the graph is a little misleading because it doesn't include zero: you don't have a sense of how much the population grew relative to where it was when it started. This is a good time to introduce another graphing option.4. Starting y-axis at zero
By adding "expand underscore limits y = 0" to the end of the ggplot call, you can specify that you want the y-axis to start at zero. Notice that you added it to the end just like you would with scale_x_log10, or facet_wrap. Now the graph makes it clearer that the population is almost tripling during this time. You could have created other graphs of summarized data, such as a graph of the average life expectancy over time, by changing the y aesthetic. So5. Summarizing by year and continent
far you've been graphing the by-year summarized data. But you have also learned to summarize after grouping by both year and continent, to see how the changes in population have occurred separately within each continent. Since you now have data over time within each continent, you need a way to separate it in a visualization. To do that you can use the color aesthetic6. Visualizing population by year and continent
you learned about in chapter two. By setting color equals continent, you can show five separate trends on the same graph. This lets us see that Asia was always the most populated continent and has been growing the most rapidly, that Europe has a slower rate of growth, and that Africa has grown to surpass both Europe and the Americas in terms of population. In Chapter 4 you'll learn to turn these into line plots that are a bit better for presenting data over time. You'll often combine dplyr verbs and ggplot2 visualizations as part of an exploratory analysis, so it's important to get into the habit of visualizing summarized or processed data.7. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.