1. Choropleths with geopandas
We will look at two ways to create a choropleth from data. In the first approach we will use geopandas. But before we get to that let's talk about sequential color maps.
2. Sequential colormaps
Earlier in this course we looked at the qualitative colormaps that are provided by matplotlib. Here are the basic sequential colormaps that matplotlib gives us. A sequential colormap shows a clear progression from low to high lightness values across one color or two or more related colors.
3. Choropleth with GeoDataFrame.plot()
Geopandas chooses a quantitative colormap when the column plotted has numerical data. To plot a choropleth, just set the column argument of .plot() equal to a column with normalized numerical data. Here we use the school_density column that we created by dividing the count of schools in each district by that district's area in decimal degrees squared.
The default colormap, called viridis is a sequential map, but we'll change to a colormap that is more appropriate for a choropleth: one that stays in or near the same color family.
4. Choropleth with GeoDataFrame.plot()
Setting cmap to 'BuGn' tells geopandas to use the BlueGreen colormap.
Regions with similar values can sometimes be hard to differentiate. We set edgecolor equal to 'black' here to outline the school district polygons.
It is easy to see the relative differences between school districts, but the scale (schools per decimal degree squared) is hard to interpret. Let's look at how we can preserve decimal degrees for longitude and latitude on the x and y axes, but use kilometers squared for our density measure.
5. Area in Kilometers Squared
We'll back up to the point just before we merged the schools and the school_districts. The original CRS for the school_districts was EPSG:4326.
We can change the Coordinate Reference System of the school_districts from EPSG:4326 - which uses decimal degrees for distance - to EPSG:3857 which uses meters.
6. Area in Kilometers Squared
Now that the CRS is EPSG:3857, we'll get the area of each school district and store it as a new column in the school_districts GeoDataFrame. We divide by the sqm_to_sqkm variable to get area in square kilometers.
Area in square kilometers is easier to interpret than decimal degrees squared. But now the latitude and longitude measures are in meters instead of the traditional measure: decimal degrees. Let's fix that.
7. Latitude and longitude in decimal degrees
We can reconvert school_districts to EPSG:4326 before spatially joining it to the schools.
Now we have a GeoDataFrame with area in kilometers squared and the geometry in decimal degrees.
Next we spatially join the school_districts to the schools GeoDataFrame.
8. Counting schools in each district
After the spatial join, we group by district and get the size of each group as school_counts.
Convert the school_counts to a DataFrame,
and merge the school_districts with the school_counts.
Because we have listed the GeoDataFrame first in the pandas merge operation, our resulting merged object is a GeoDataFrame. If we had listed the school_counts_df first, the merge would return a DataFrame.
It is important when you merge counts with a GeoDataFrame -- and you want to create a map with the results -- that the GeoDataFrame is the first argument in the merge method.
9. Calculating school density
Now we can calculate the school_density by dividing the counts by the area,
and create a choropleth with .plot(), setting column to school_density.
10. School density choropleth
Now we know not only are schools most densely placed in the central school district, but the density there is greater than one school for every three kilometers squared!
11. Let's practice!
Now it's your turn to plot a choropleth using geopandas.
You'll use the normalized permit_density column you created in the last exercise. Let's go!