Vergleich mit zwei KDEs

Stell dir vor, du arbeitest beim führenden Anbieter von Luftfiltern. Dein Unternehmen hat dich gebeten, einen Bericht zu erstellen, der untersucht, warum 2012 ein besonders gutes Jahr für den Verkauf eurer Ozonfilter (O₃) war. Du hast hilfreiche Verschmutzungsdaten vom USGS heruntergeladen und möchtest eine prägnante Visualisierung erstellen, die das allgemeine Muster der O₃-Belastung im Jahr 2012 mit allen anderen Jahren in den Aufzeichnungen vergleicht.

Dafür kannst du zwei überlagerte Kernel-Dichte-Schätzungen (KDEs) bauen: eine für die O₃-Daten aus 2012 und eine für alle anderen Jahre.

Diese Übung ist Teil des Kurses

<Kurs>So verbesserst du deine Datenvisualisierungen in Python</Kurs>

Übungsanweisungen

Filtere die Daten im ersten sns.kdeplot()-Aufruf so, dass nur das Jahr 2012 enthalten ist.
Färbe die Fläche unter der ersten KDE mit dem Argument shade ein.
Füge für die Legende das Label '2012' hinzu.
Wiederhole die ersten drei Schritte für den zweiten sns.kdeplot()-Aufruf, filtere die Daten diesmal jedoch so, dass 2012 nicht enthalten ist. Verwende das Label 'other years'.

Interaktive praktische Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Filter dataset to the year 2012
sns.kdeplot(pollution[pollution.year ____ ____].O3, 
            # Shade under kde and add a helpful label
            shade = ____,
            ____ = '____')

# Filter dataset to everything except the year 2012
sns.kdeplot(pollution[pollution.year ____ ____].O3, 
            # Again, shade under kde and add a helpful label
            shade = ____,
            ____ = '____')
plt.show()

Code bearbeiten und ausführen

Diese Übung ist Teil des Kurses

<Kurs>So verbesserst du deine Datenvisualisierungen in Python</Kurs>

Mittlere SchwierigkeitSchwierigkeitsgrad

4.7+

Kurs kostenlos starten

How do you show all of your data while making sure that viewers don't miss an important point or points? Here we discuss how to guide your viewer through the data with color-based highlights and text. We also introduce a dataset on common pollutant values across the United States.

Exercise 1: Daten hervorheben Exercise 2: Einen Highlight hart codieren Exercise 3: Hervorhebung programmatisch erstellen Exercise 4: Gruppen vergleichen Exercise 5: Vergleich mit zwei KDEs

Aktuelle Übung

Exercise 6: Deine KDEs verbessern Exercise 7: Beeswarms Exercise 8: Anmerkungen Exercise 9: Eine einfache Textannotation Exercise 10: Pfeil-Annotationen Exercise 11: Annotationen und Farbe kombinieren

Color is a powerful tool for encoded values in data visualization. However, with this power comes danger. In this chapter, we talk about how to choose an appropriate color palette for your visualization based upon the type of data it is showing.

Exercise 1: Color in visualizations Exercise 2: Getting rid of unnecessary color Exercise 3: Fixing Seaborn's bar charts Exercise 4: Continuous color palettes Exercise 5: Making a custom continuous palette Exercise 6: Customizing a diverging palette heatmap Exercise 7: Adjusting your palette according to context Exercise 8: Categorical palettes Exercise 9: Using a custom categorical palette Exercise 10: Dealing with too many categories Exercise 11: Coloring ordinal categories Exercise 12: Choosing the right variable to encode with color

Uncertainty occurs everywhere in data science, but it's frequently left out of visualizations where it should be included. Here, we review what a confidence interval is and how to visualize them for both single estimates and continuous functions. Additionally, we discuss the bootstrap resampling technique for assessing uncertainty and how to visualize it properly.

Exercise 1: Point estimate intervals Exercise 2: Basic confidence intervals Exercise 3: Annotating confidence intervals Exercise 4: Confidence bands Exercise 5: Making a confidence band Exercise 6: Separating a lot of bands Exercise 7: Cleaning up bands for overlaps Exercise 8: Beyond 95%Exercise 9: 90, 95, and 99% intervals Exercise 10: 90 and 95% bands Exercise 11: Using band thickness instead of coloring Exercise 12: Visualizing the bootstrap Exercise 13: The bootstrap histogram Exercise 14: Bootstrapped regressions Exercise 15: Lots of bootstraps with beeswarms

Often visualization is taught in isolation, with best practices only discussed in a general way. In reality, you will need to bend the rules for different scenarios. From messy exploratory visualizations to polishing the font sizes of your final product; in this chapter, we dive into how to optimize your visualizations at each step of a data science workflow.

Exercise 1: First explorations Exercise 2: Looking at the farmers market data Exercise 3: Scatter matrix of numeric columns Exercise 4: Digging in with basic transforms Exercise 5: Exploring the patterns Exercise 6: Is latitude related to months open?Exercise 7: What state is the most market-friendly?Exercise 8: Popularity of goods sold by state Exercise 9: Making your visualizations efficient Exercise 10: Stacking to find trends Exercise 11: Using a plot as a legend Exercise 12: Tweaking your plots Exercise 13: Cleaning up the background Exercise 14: Remixing a plot Exercise 15: Enhancing legibility Exercise 16: Congrats!