The spatial join operation
1. The "spatial join" operation
In this video, we will introduce the concept of a 'spatial join', building further upon the spatial relationships we have seen in the previous exercises.2. Spatial relationships I
We return to the countries and cities datasets. Here, you have a basic visualisation of both datasets together. We can see that most cities fall within one of the countries, and that most countries contain one or more cities. These relationships are intrinsically spatial, and we can exploit this fact to augment one with the information contained in the other one.3. Spatial relationships II
Let's explore this spatial relationship by building on code we have seen in the previous exercises. Let's take the country of Brazil as an example. We locate and extract its geometry from the countries dataframe. We can now use one of the spatial operations provided by GeoPandas to check the spatial relationship. In this case, we want to check which of the cities are located within the area of Brazil, and for that we can use the 'within' method and a boolean filtering operation. But what if we wanted to perform this same operation for every country? For example, we might want to know for each city in which country it is located. In tabular jargon, this would imply adding a column to our cities dataframe with the name of the country in which it is located. Since country name is contained in the countries dataset, we need to combine - or "join" - information from both datasets. Joining on location (rather than on a shared column) is called a "spatial join".4. The Spatial Join
In a spatial join operation, we will transfer information (data attributes) from one dataframe to another based on their spatial relationship. Consider the small example shown here. We have one dataset with 3 polygons: A, B and C, and we have a dataset with 4 points (1, 2, 3, and 4). A "spatial join" will link attributes stored in the polygons GeoDataFrame to the points GeoDataFrame based on which polygon contains every point. This way, in our example, attributes from polygon A would be joined to point 1; and those from polygon B would be linked to points 2 and 4. This is an example of what we call a "left join", keeping the rows and order of the left dataset (the points).5. The spatial join with GeoPandas
So how do we do this with GeoPandas? The spatial join operation is available as the sjoin function. The first argument we specify is the geodataframe to which we want add information, in our case the cities. The second argument is the geodataframe that contains the information we want to add. Finally, we specify which spatial relationship we want to use to match both datasets. In our case this is the "within" operation, as we want to make geopandas check whether rows in the table on the left (cities) are "within" those in the table on the right (countries), joining those where that is the case. Notice that the order of the arguments is important here, first passing the cities and then the countries, as the "within" relationship operation only makes sense in that order. And let's look at the result. In our joined geodataframe, we can find country information next to the original attributes of each city. Crucially, the resulting geodataframe retains the geometry and order of the left input geodataframe, that of cities one in this case.6. Let's practice!
Let's now put this into practice using our Paris datasets.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.