Programmatically creating a highlight

You are continuing your work for the city of Houston. Now you want to look at the behavior of both NO₂ and SO₂ when the un-plotted ozone (O₃) value was at its highest.

To do this, replace the logic in the current list comprehension with one that compares a row's O3 value with the highest observed O₃ in the dataset. Note: use sns.scatterplot() instead of sns.regplot(). This is because sns.scatterplot() can take a non-color vector as its hue argument and colors the points automatically while providing a helpful legend.

This exercise is part of the course

Improving Your Data Visualizations in Python

Exercise instructions

Find the value corresponding to the highest observed O3 value in the houston_pollution DataFrame. Make sure to type the letter O and not the number zero!
Append the column 'point_type' to the houston_pollution DataFrame to mark if the row contains the highest observed O₃.
Pass this newly created column to the hue argument of sns.scatterplot() to color the points.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

houston_pollution = pollution[pollution.city  ==  'Houston'].copy()

# Find the highest observed O3 value
max_O3 = houston_pollution.O3.____

# Make a column that denotes which day had highest O3
houston_pollution['____'] = ['Highest O3 Day' if ____  ==  ____ else 'Others' for O3 in houston_pollution.O3]

# Encode the hue of the points with the O3 generated column
sns.scatterplot(x = 'NO2',
                y = 'SO2',
                hue = '____',
                data = houston_pollution)
plt.show()

Edit and Run Code

This exercise is part of the course

Improving Your Data Visualizations in Python

IntermediateSkill Level

4.7+

Start Course for Free

How do you show all of your data while making sure that viewers don't miss an important point or points? Here we discuss how to guide your viewer through the data with color-based highlights and text. We also introduce a dataset on common pollutant values across the United States.

Exercise 1: Highlighting data Exercise 2: Hardcoding a highlight Exercise 3: Programmatically creating a highlight

Current Exercise

Exercise 4: Comparing groups Exercise 5: Comparing with two KDEs Exercise 6: Improving your KDEs Exercise 7: Beeswarms Exercise 8: Annotations Exercise 9: A basic text annotation Exercise 10: Arrow annotations Exercise 11: Combining annotations and color

Color is a powerful tool for encoded values in data visualization. However, with this power comes danger. In this chapter, we talk about how to choose an appropriate color palette for your visualization based upon the type of data it is showing.

Exercise 1: Color in visualizations Exercise 2: Getting rid of unnecessary color Exercise 3: Fixing Seaborn's bar charts Exercise 4: Continuous color palettes Exercise 5: Making a custom continuous palette Exercise 6: Customizing a diverging palette heatmap Exercise 7: Adjusting your palette according to context Exercise 8: Categorical palettes Exercise 9: Using a custom categorical palette Exercise 10: Dealing with too many categories Exercise 11: Coloring ordinal categories Exercise 12: Choosing the right variable to encode with color

Uncertainty occurs everywhere in data science, but it's frequently left out of visualizations where it should be included. Here, we review what a confidence interval is and how to visualize them for both single estimates and continuous functions. Additionally, we discuss the bootstrap resampling technique for assessing uncertainty and how to visualize it properly.

Exercise 1: Point estimate intervals Exercise 2: Basic confidence intervals Exercise 3: Annotating confidence intervals Exercise 4: Confidence bands Exercise 5: Making a confidence band Exercise 6: Separating a lot of bands Exercise 7: Cleaning up bands for overlaps Exercise 8: Beyond 95%Exercise 9: 90, 95, and 99% intervals Exercise 10: 90 and 95% bands Exercise 11: Using band thickness instead of coloring Exercise 12: Visualizing the bootstrap Exercise 13: The bootstrap histogram Exercise 14: Bootstrapped regressions Exercise 15: Lots of bootstraps with beeswarms

Often visualization is taught in isolation, with best practices only discussed in a general way. In reality, you will need to bend the rules for different scenarios. From messy exploratory visualizations to polishing the font sizes of your final product; in this chapter, we dive into how to optimize your visualizations at each step of a data science workflow.

Exercise 1: First explorations Exercise 2: Looking at the farmers market data Exercise 3: Scatter matrix of numeric columns Exercise 4: Digging in with basic transforms Exercise 5: Exploring the patterns Exercise 6: Is latitude related to months open?Exercise 7: What state is the most market-friendly?Exercise 8: Popularity of goods sold by state Exercise 9: Making your visualizations efficient Exercise 10: Stacking to find trends Exercise 11: Using a plot as a legend Exercise 12: Tweaking your plots Exercise 13: Cleaning up the background Exercise 14: Remixing a plot Exercise 15: Enhancing legibility Exercise 16: Congrats!