Get startedGet started for free

Adding data to spatial objects

1. Adding data to spatial objects

So far, you've got polygons for the tracts, but you don't have any income data for these tracts.

2. Income data from ACS

The income data come from the American Community Survey, which we obtained using the acs package and provided to you in the nyc_income data frame. You'll notice it has 288 rows, one for each tract. The tract column provides a tract ID, the estimate column gives the estimated median income, and the se column provides an standard error for this estimate.

3. Tract polygon data

Compare this to the nyc_tracts SpatialPolygonsDataFrame. This object has information on 288 polygons, one polygon for each tract. It has some information on these polygons stored in the data slot.

4. Tract polygon data

If we take a closer look at this data slot, we see there is a TRACTCE column that looks like it has similar tract IDs to our income data. To create a choropleth map, we need to add the income estimates to this data frame. The Spatial___DataFrame classes are so useful because they keep together the spatial information for each element along with the data associated with each element, but what isn't clear, is how these classes manage the correspondence between these two.

5. Correspondence between polygons and data

Let's explore this. Here, we have a simple SpatialPolygons object, that is, just the spatial information, for four tracts. Each polygon in this object has an ID slot. You can see these IDs by iterating over the polygons slot using sapply, each time pulling out the ID slot. You see they have the ids 156 through 159. Separately, we also have a data frame with tract identifiers. This data frame has rownames that correspond to our polygons, although they happen to be in reverse order.

6. Correspondence between polygons and data

If we use the SpatialPolygonsDataFrame function to combine these two objects, the IDs of the polygons are matched to the rownames of the data frame,

7. Correspondence between polygons and data

and if you examine the data slot of the resulting object, you can see it has been reordered to match the order of the polygons.

8. Correspondence between polygons and data

If you turn off this matching by specifying match ID equals FALSE, then the data is matched to the polygons assuming they are both already in the right order. The first polygon is assumed to correspond to the first row in the data, and so on. In this case, that is incorrect and you could end up associating the wrong tract ID with the polygons.

9. Adding new data

How does this relate to adding data to a Spatial___DataFrame object? Well, once created there is nothing that enforces this correspondence. If we switch out the data in the data slot, we are at risk of breaking the correspondence, and having the wrong data associated with each spatial entity. So, how do you add new data? One option is to recreate the Spatial___DataFrame object, being careful to make sure to match the right polygons to the right rows, the safest way is to enforce matching and provide IDs for the polygons and rownames in the data frame. The second option is to use the merge function in the sp package. This function works a lot like the base R merge, but it is designed for use with spatial objects, and is the easiest way to ensure you don't lose the correspondence between rows of data and polygons.

10. Let's practice!

You'll learn about using merge in the following exercises.