Dealing with too many categories

Sometimes you may be short on figure space and need to show a lot of data at once. Here you want to show the year-long trajectory of every pollutant for every city in the pollution dataset. Each pollutant trajectory will be plotted as a line with the y-value corresponding to standard deviations from year's average. This means you will have a lot of lines on your plot at once -- way more than you could separate clearly with color.

To deal with this, you have decided to highlight on a small subset of city pollutant combinations (wanted_combos). This subset is the most important to you, and the other trajectories will provide valuable context for comparison. To focus attention, you will set all the non-highlighted trajectories lines to of the same 'other' color.

Modify the list comprehension to isolate the desired combinations of city and pollutant (wanted_combos).
Tell the line plot to color the lines by the newly created color_cats column in your DataFrame.
Use the units argument to determine how, i.e., from which column, the data points should be connected to form each line.
Disable the binning of points with the estimator argument.

Highlighting your data

Using color in your visualizations

Showing uncertainty

Visualization in the data science workflow

Exercise

Dealing with too many categories

Instructions