Combining and joining census geographic datasets
1. Combining and joining census geographic datasets
In the previous lessons, you've learned how to work with data you've obtained using the tigris package. In many workflows, you'll then want to combine these datasets with each other or join them to external datasets.2. The "tigris" attribute
When tigris loads Census Bureau geographic data into R, it gives datasets special tigris attributes that identify the type of dataset, such as tracts, states, or roads. This allows users to check to see if different datasets are of the same Census dataset type, like the Census tract datasets for Missouri and Kansas shown in the slide.3. Combining datasets with rbind_tigris()
The tigris attribute is important when using the tigris function rbind_tigris() to combine datasets. For example, an analyst studying the Kansas City metropolitan area would commonly want to combine data for Kansas and Missouri, given that the metropolitan area extends into both states. rbind_tigris() wraps the equivalent functions from the sp and sf packages for combining spatial datasets but also checks the datasets' tigris attributes to avoid combining datasets of different geographic types.4. Combining datasets with tidyverse tools
rbind_tigris() can also accept a list of spatial objects obtained with the tigris package, which can be useful when working with more than two states. In the example shown on the slide, we generate a vector of four states in the US region of New England: Maine, New Hampshire, Vermont, and Massachusetts. We then use the tidyverse map() function to iterate through these states and generate a list of Census tract datasets, obtained with the tracts function, for each state. This list can then be piped to rbind_tigris() to combine the result, giving us a New England tract dataset as output.5. Joining data from a data frame
Analysts will also commonly want to join data to Census boundary files from external dataframes. If working with sp classes, tigris includes a helper function named geo_join() to assist with this. If using sf classes, dplyr's suite of joining functions are available through the sf package. In this example, we are interested in associating a dataset of Texas legislators' names and political parties with a dataset of the legislative district boundaries they represent, obtained by tigris. To accomplish this, we can use the state_legislative_districts() function in tigris to get the appropriate boundary file. We then use the left_join() function to join the boundaries dataset to the informational dataset based on matching values. In this case, the NAME column in the spatial dataset matches the District column in the legislator information dataset.6. Joining data from a data frame
Here, we see the output of our join operation as shown by the glimpse() function. The resulting dataset includes all the standard geographic information such as Census code and land and water area, but also the names, cities, and political parties of the representatives of those legislative districts.7. Let's practice!
Now, it's your turn to try combining and joining tigris datasets.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.