Checking data will match
Forcing your data into the data
slot doesn't work because you lose the correct correspondence between rows and spatial objects. How do you add the income data to the polygon data? The merge()
function in sp
is designed exactly for this purpose.
You might have seen merge()
before with data frames. sp::merge()
has almost the exact same structure, but you pass it a Spatial***
object and a data frame and it returns a new Spatial***
object where the data slot is now a merge of the original data slot and the data frame. To do this merge, you'll require both the spatial object and data frame to have a column that contains IDs to match on.
Both nyc_tracts
and nyc_income
have columns with tract IDs, so these are great candidates for merging the two datasets. However, it's always a good idea to check that the proposed IDs are unique and that there is a match for every row in both datasets.
Let's check this before moving on to the merge.
This exercise is part of the course
Visualizing Geospatial Data in R
Exercise instructions
- Use
any()
withduplicated()
onnyc_income$tract
to check if every row innyc_income
has a unique tract ID. - Use
any()
withduplicated()
onnyc_tracts$TRACTCE
to check if every row innyc_tracts
has a unique tract ID. - Use
all()
onnyc_tracts$TRACTCE %in% nyc_income$tract
to check thenyc_tracts
tracts are all innyc_income
. - Use
all()
onnyc_income$tract %in% nyc_tracts$TRACTCE
to check thenyc_income
tracts are all innyc_tracts
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Check for duplicates in nyc_income
# Check for duplicates in nyc_tracts
# Check nyc_tracts in nyc_income
# Check nyc_income in nyc_tracts