1. Visualizing grain yields
Now that you've dealt with the unit issue, it's time to take a look at the datasets.
2. The corn dataset
Here's the current state of the corn dataset. The variable that looks the most interesting to me is the yield. I'm curious to know how it changes over time, and between the different states.
So that you can visualize it, here's a reminder of some ggplot2 and dplyr techniques.
3. ggplot2: drawing multiple lines
To draw multiple lines using ggplot2, you use geom_line to add a line layer, and use the group aesthetic to specify which rows of data belong to each line.
4. ggplot2: smooth trends
You'll also be drawing smooth trend lines, which are done by adding geom_smooth.
5. ggplot2: facetting
To split a plot into multiple facets -- that is, multiple panels -- you use facet_wrap. The name of the variable to split the plot by is enclosed in a call to vars.
6. USA Census regions
In order to look at regional differences in yields, you'll be using the nine USA Census regions. The Corn Belt, where corn is traditionally grown in the USA is centered around the West North Central region, and parts of the East North Central region. The Wheat Belt is in the West South Central region.
7. dplyr inner joins
An inner join will merge two data frames together, finding rows in each data frame where a column has the same value.
To use it, pipe from the first data frame to inner_join, passing the second data frame and specifying the column to look for matching values in as a string in the by argument.
8. Let's practice!
Time to explore the crop datasets!