1. Spatial joins
In this video, you will learn how to spatially join two GeoDataFrames.
2. Council districts and school districts
Let's look at an example.
These polygon plots show the 35 council districts on the left and the 9 school districts on the right.
We want to know if any council districts are contained completely within a school district, and if any council districts intersect with any school districts.
Geopandas has a spatial join method, .sjoin(), that will help us find out.
3. The .sjoin() predicate argument
.sjoin() takes an argument - predicate, which specifies the type of spatial join.
You will learn about three values that can be assigned to predicate: intersects, contains, and within.
4. Using .sjoin()
Say we have two GeoDataFrames with overlapping geometries.
One is a single polygon and the other is a collection of black points. Let's see how predicate works with this data.
5. predicate = 'intersects'
You can read the predicate as a word to explain the relationship between the two datasets.
intersects returns all observations where the blue_region intersects points.
6. predicate = 'contains'
'contains' returns observations where the blue_region completely contains points. Notice that points on the border are gone.
7. predicate = 'within'
There are no cases where the blue_region is within a point.
But we can reverse the order of the GeoDataFrames in .sjoin() to get points within the region.
8. The .sjoin() predicate argument - within
To answer our questions: how many council districts are completely contained in a school district and how many council_district-school_district intersections exist, we'll create three new GeoDataFrames. One for each type of spatial join.
within_gdf results from spatially joining council_districts and school_districts with predicate = 'within'.
It returns 11 rows - observations where a council district is within a school district.
9. The .sjoin() predicate argument - contains
contains_gdf is the GeoDataFrame that results from spatially joining school districts and council districts with predicate equal to contains.
Notice we have switched the order of the first two arguments. In the first example we were looking for council_districts within school_districts. In this case, we want school_districts that contain council_districts.
As expected, we have the same count. There are 11 school districts that contain council districts.
10. The .sjoin() predicate argument - intersects
intersect_gdf gets the spatial join of council districts and school districts with predicate equal to intersects.
There are 100 observations of a council district intersecting a school district.
11. Columns in a spatially joined GeoDataFrame
Let's take a look at one of the spatially joined GeoDataFrames.
Here are the first 4 columns of the within_gdf GeoDataFrame - the council districts that are contained completely within a school district.
Notice that underscore-left is part of the column name for the first 3 columns. This tells us that these are the first and last names and district numbers for the left-hand GeoDataFrame in our spatial join: the council_districts.
Columns named first_name_right, last_name_right, and district_right have also been created from the school district data. They just aren't shown here due to space constraints.
index-right indicates the position of the joined data in the original right-hand GeoDataFrame: the school_districts. Row index values for the left-hand GeoDataFrame are retained as the row indices for the new within_gdf GeoDataFrame.
Note that the geometry for a spatially joined GeoDataFrame will be the type of the first (or left) GeoDataFrame passed to the sjoin() method.
12. Aggregating spatially joined data
You can work with within_gdf to find how many council districts are entirely within a school district.
Let's rename district_left and district_right to council_district and school_district to make the next steps easier to follow.
First, subset the within_gdf GeoDataFrame to just council_district and school_district.
Group by the school districts,
and then aggregate to get a count of council districts that are in each school district.
Finally, we can use the .sort_values() method to see which school district contains the most council districts.
13. Let's Practice!
You've seen how to spatially join GeoDataFrames in three different ways. Now it's time to practice.