Session Ready
Exercise

Dealing with too many categories

Sometimes you may be short on figure space and need to show a lot of data at once. Here you want to show the year-long trajectory of every pollutant for every city in the pollution dataset. Each pollutant trajectory will be plotted as a line with the y-value corresponding to standard deviations from year's average. This means you will have a lot of lines on your plot at once -- way more than you could separate clearly with color.

To deal with this, you have decided to highlight on a small subset of city pollutant combinations (wanted_combos). This subset is the most important to you, and the other trajectories will provide valuable context for comparison. To focus attention, you will set all the non-highlighted trajectories lines to of the same 'other' color.

Instructions
100 XP
  • Modify the list comprehension to isolate the desired combinations of city and pollutant (wanted_combos).
  • Tell the line plot to color the lines by the newly created color_cats column in your DataFrame.
  • Use the units argument to determine how, i.e., from which column, the data points should be connected to form each line.
  • Disable the binning of points with the estimator argument.