Get startedGet started for free

Understanding census geography and tigris basics

1. Understanding census geography and tigris basics

A wide variety of geographic datasets are available to analysts from the US Census Bureau. In this chapter, you'll learn how to acquire, process, and map these datasets, all from within R.

2. TIGER/Line Shapefiles

The US Census Bureau stores geographical data in its Topologically Integrated Geographic Encoding and Referencing database. Extracts from this database are made publicly available by the Census Bureau through its repository of TIGER/Line shapefiles. Shapefiles are a common format for encoding vector geographic data as points, lines, and polygons, and can be loaded into R.

3. The tigris R package

Several packages are available to R users to load shapefiles as R objects. The process of downloading and then loading TIGER/Line files from the Census website, however, can be cumbersome. The tigris R package simplifies this process with a suite of functions to download these shapefiles and then load them automatically into R.

4. Legal and statistical entities

Most TIGER/Line datasets represent boundary files, which are geographies within which decennial Census and American Community Survey data are aggregated. Boundary files are either legal entities, which are areas that have legal standing in the United States such as states and counties, or statistical entities, which are areas designed by the Census Bureau for data tabulation. The example here shows how to use the counties() function to obtain a county boundaries dataset for the US state of Arizona.

5. Geographic features

The TIGER/Line datasets also include geographic features, which are a series of geographic datasets useful for thematic mapping and spatial analysis. These features include transportation datasets such as roads and railroads and linear and area water features. In this example, the primary_secondary_roads() function is used to obtain major roads data for the US state of New Hampshire.

6. tigris and Spatial objects

By default, tigris returns Census geographic datasets as objects of class Spatial DataFrame, the structure used by the sp R package for representing spatial objects. Spatial DataFrames include a series of slots that encode different characteristics of the datasets, including their geometry, attributes, and coordinate systems. The slide shows a sample of rows from the New Hampshire roads data slot and coordinate system information stored in the proj4string slot. Census shapefiles, and by proxy datasets obtained by tigris, are stored in the North American Datum of 1983 geographic coordinate system. To learn more about coordinate reference systems in R, I recommend the book Geocomputation with R, which is linked in the slide.

7. Let's practice!

Let's try out these examples in R.