Get startedGet started for free

Choosing the right variable to encode with color

You're tasked with visualizing pollution values for Long Beach and nearby cities over time. The supplied code makes the below (hard-to-read plot), which consists of maximum pollution values (provided as max_pollutant_values) with the bars colored by the city.

Mutlicolor and busy bar plots with four rows corresponding to the four pollutants in dataset

You can quickly improve this with a few tweaks. By modifying the cities shown to only those in the western half of the country you will avoid clutter. Next, swapping the color-encoding from city to year allows you to use an ordinal palette, saving the reader from continually referring to the legend to check which color corresponds to which city.

This exercise is part of the course

Improving Your Data Visualizations in Python

View Course

Exercise instructions

  • Remove 'Indianapolis', 'Des Moines', 'Cincinnati', 'Houston' from the cities vector.
  • Swap the encodings of the city and year variables.
  • Use the 'BuGn' ColorBrewer palette to map your colors appropriately for the newly ordinal variable.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Reduce to just cities in the western half of US
cities = ['Fairbanks', 'Long Beach', 'Vandenberg Air Force Base', 'Denver', 
          'Indianapolis', 'Des Moines', 'Cincinnati', 'Houston']

# Filter data to desired cities
city_maxes = max_pollutant_values[max_pollutant_values.city.isin(cities)]

# Swap city and year encodings
sns.catplot(x = 'year', hue = 'city',
              y = 'value', row = 'pollutant',    
              # Change palette to one appropriate for ordinal categories
              data = city_maxes, palette = 'muted',
              sharey = False, kind = 'bar')
plt.show()
Edit and Run Code