1. Choropleths: Mapping data over space
In this video, we will focus on how to map attribute data using choropleths.
2. Choropleths
Choropleths are maps onto which an attribute, a non-spatial variable, is displayed. We encode its values by using a color scheme where each color represents one or a few values.
This might sound complex, but we have already seen several examples in this course. An additional one is shown here, where the GDP variable is used to assign a color to the polygon of each country.
3. Choropleths
Creating choropleths in Python is simple. We have already used the plot() method of GeoPandas.
When specifying a numerical column to color the polygons, this results in a continuous color scale, as in the previous slide.
However, it is very difficult for the human eye to process small differences in color in a continuous scale.
Therefore, to create effective choropleths, we typically classify the values into a set of discrete groups.
There are three aspects we need to pay attention to: how many groups to use, how to allocate each value to a color, and which color palette to use. Let's explore each in more detail!
4. Number of classes ("k")
Because we encode a potentially large number of values into a small number of colors, choropleths involve a loss of information and detail. However, this also makes the final map more easily interpretable.
Choosing this number of classes, specified by the keyword k, is a trade-off. Use too few classes and the map will barely tell anything interesting; use too many, and there will be more information than a human can process meaningfully.
Although the right answer depends on each case, research suggests that somewhere between three and twelve categories strikes an adequate balance.
5. Classiffication algorithms ("scheme")
With a number of groups chosen, next is: how are we going to assign values into a specific color of the palette?
There exist several techniques. In this course, we are going to explore two of the most widely used ones: equal intervals, and quantiles. Let's look at each in more detail!
6. Equal Intervals
Equal intervals splits the value range into equal segments, and assigns a different color to each "bin". As the left figure shows, this creates bins of equal width. However, if our variable is unevenly distributed across the value space (as is the case here), we will end up with maps as the one on the right, where a large group of observations are allocated to the same color.
7. Quantiles
To get around that issue, the quantile classification ranks all the values and allocates the same proportion to each color bin. For this example with seven groups, each will take 1/7th of the observations. This balances the number of observations per color, but only at the expense of potentially including very dissimilar values in the same color, as the highest bin in the example illustrates.
8. Color
Finally, we need to select a color palette. The trick here is to pick one that is aligned with the purpose of the map and the nature of the data.
For categorical data, we want palettes that do not imply any gradient or scale of any kind.
If our data are continuous and the minimum value is a natural starting point, we will use a sequential palette.
For continuous data where the natural starting point is in the middle we can use a divergent palette, which is composed of two different sequential ones moving in opposite directions.
9. Let's practice!
Now that you know the basics of creating effective choropleths, let's practice making them in Python!