What state is the most market-friendly?

While exploring the farmer's market data, you wonder what patterns may show up if you aggregated to the state level. Are some states more market-friendly than other states? To investigate this, you group your data by state and get the log-transformed number of markets (log_markets) and state populations (log_pop).

markets_and_pop = (markets
    .groupby('state', as_index = False)
    .agg({
       'name': lambda d: log(len(d)),
       'state_pop': lambda d: log(d.iloc[0]) })
    .rename(columns = {
        'name': 'log_markets', 
        'state_pop': 'log_pop' }))

To visualize, you decide to use a regression plot to get an idea of the 'normal' relationship between market and population numbers and a text-scatter to quickly identify interesting outliers.

Este exercício faz parte do curso

Improving Your Data Visualizations in Python

Instruções do exercício

Iterate over the rows of the markets_and_pop DataFrame.
Place annotations next to their scatter plot points.
Reduce annotation text size to 10 points.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

g = sns.regplot(
    "log_markets", "log_pop", 
    ci = False,
    # Shrink scatter plot points
    scatter_kws = {'s':2},
    data = markets_and_pop)

# Iterate over the rows of the data
for _, row in markets_and_pop.____():
    state, _, _, log_markets, log_pop = row
    # Place annotation and reduce size for clarity
    g.annotate(state, (____,____), ____ = ____)

plt.show()

Editar e executar o código

Este exercício faz parte do curso

Improving Your Data Visualizations in Python

IntermediárioNível de habilidade

4.7+

Iniciar curso de graça

How do you show all of your data while making sure that viewers don't miss an important point or points? Here we discuss how to guide your viewer through the data with color-based highlights and text. We also introduce a dataset on common pollutant values across the United States.

Exercise 1: Highlighting data Exercise 2: Hardcoding a highlight Exercise 3: Programmatically creating a highlight Exercise 4: Comparing groups Exercise 5: Comparing with two KDEs Exercise 6: Improving your KDEs Exercise 7: Beeswarms Exercise 8: Annotations Exercise 9: A basic text annotation Exercise 10: Arrow annotations Exercise 11: Combining annotations and color

Color is a powerful tool for encoded values in data visualization. However, with this power comes danger. In this chapter, we talk about how to choose an appropriate color palette for your visualization based upon the type of data it is showing.

Exercise 1: Color in visualizations Exercise 2: Getting rid of unnecessary color Exercise 3: Fixing Seaborn's bar charts Exercise 4: Continuous color palettes Exercise 5: Making a custom continuous palette Exercise 6: Customizing a diverging palette heatmap Exercise 7: Adjusting your palette according to context Exercise 8: Categorical palettes Exercise 9: Using a custom categorical palette Exercise 10: Dealing with too many categories Exercise 11: Coloring ordinal categories Exercise 12: Choosing the right variable to encode with color

Uncertainty occurs everywhere in data science, but it's frequently left out of visualizations where it should be included. Here, we review what a confidence interval is and how to visualize them for both single estimates and continuous functions. Additionally, we discuss the bootstrap resampling technique for assessing uncertainty and how to visualize it properly.

Exercise 1: Point estimate intervals Exercise 2: Basic confidence intervals Exercise 3: Annotating confidence intervals Exercise 4: Confidence bands Exercise 5: Making a confidence band Exercise 6: Separating a lot of bands Exercise 7: Cleaning up bands for overlaps Exercise 8: Beyond 95%Exercise 9: 90, 95, and 99% intervals Exercise 10: 90 and 95% bands Exercise 11: Using band thickness instead of coloring Exercise 12: Visualizing the bootstrap Exercise 13: The bootstrap histogram Exercise 14: Bootstrapped regressions Exercise 15: Lots of bootstraps with beeswarms

Often visualization is taught in isolation, with best practices only discussed in a general way. In reality, you will need to bend the rules for different scenarios. From messy exploratory visualizations to polishing the font sizes of your final product; in this chapter, we dive into how to optimize your visualizations at each step of a data science workflow.

Exercise 1: First explorations Exercise 2: Looking at the farmers market data Exercise 3: Scatter matrix of numeric columns Exercise 4: Digging in with basic transforms Exercise 5: Exploring the patterns Exercise 6: Is latitude related to months open?Exercise 7: What state is the most market-friendly?

Exercício atual

Exercise 8: Popularity of goods sold by state Exercise 9: Making your visualizations efficient Exercise 10: Stacking to find trends Exercise 11: Using a plot as a legend Exercise 12: Tweaking your plots Exercise 13: Cleaning up the background Exercise 14: Remixing a plot Exercise 15: Enhancing legibility Exercise 16: Congrats!