1. Categorical palettes
In the last lesson, we talked all about how to customize your color palettes when dealing with continuous data. Now we're going to switch our attention to categorical data.
2. What is categorical data?
As you might expect, categorical data means data that are not continuous. This can be a huge variety of things from the different cities in our pollution data to different species of birds.
In general, a category occupies a single and un-wavering value — for instance, a bird's species. You can't have a bird fall one-third of the way between a sparrow and a hawk.
3. Limits in perception
While this lack of continuity in our data means we are free to use different colors for our palettes, we need to be mindful that humans have a hard time distinguishing between lots of colors. A general rule-of-thumb is any more than ten categories, and people stop being able to easily distinguish between the different hues.
Another perception issue that is often forgotten is color blindness. Like with continuous palettes you should assume your viewer can't distinguish between similar shades of red and green and thus you should avoid category labels that are separated by red and green hue differences.
4. Dealing with these limitations
One of the easiest and generally best ways to deal with the issue of too many colors is to decide which of your classes you'd like to focus on and then bin all the other classes into an 'other' bin of some uniform color.
5. Seaborn's categorical palettes
Luckily, Seaborn has a wide variety of built-in perceptually distinct colors palettes. Some of the best come from the Color Brewer tool by Cynthia Brewer. These palettes are both aesthetically pleasing and built for optimal class distinction.
6. Ordinal data (a)
One category of data you may notice has been left out is data that has order but also distinct classes. This is commonly referred to as ordinal data.
Ordinal data can take many forms; such as the quantile a value falls in.
7. Ordinal data (b)
Or the days of the week, where you always have seven days that fall in the same order.
8. Ordinal data (c)
Many survey results are ordinal as well. A common situation being participants are asked to rank how happy they are on a scale from 1-5 with 1 being the least happy and 5 being the happiest.
9. Building ordinal palettes
Like with the normal qualitative palettes, Seaborn has bundled built-in palettes from ColorBrewer that work very well for ordinal data and are accessible by passing the given palette name to sns.color_palette() along with the number of colors you want in your palette.
Unlike the ordered palettes we looked at in the last lesson, the palette you use for ordinal data most often won't have a null or zero value color because the classes typically don't have a null or zero value.
10. Palette shortcuts
Often when you are plotting with ordinal data you can simply pass the name of the palette to the palette argument of the plotting function and Seaborn will automatically choose the right number of colors for you without manually defining the palette.
For instance, here we've converted our NO2 values for Long Beach in 2014 to Tertials and passed the OrRd palette name to the scatterplot() function. Seaborn figures out how many categories are needed and chooses the correct palette.
11. Let's color some categories
Now that you're familiar with how to think about colors when plotting categorical data let's build some plots with our pollution data.