LoslegenKostenlos loslegen

Understanding the shuffle bottleneck

Your manager at Global Retail Analytics asks you to enrich the online_retail dataset (over 100,000 rows) with region data from country_lookup (44 rows). You write a standard left join, but the query takes several minutes. A senior engineer checks your .explain() output, spots ShuffleExchange nodes, and tells you: "The shuffle is your bottleneck."

What does the senior engineer mean by "the shuffle"?

Diese Übung ist Teil des Kurses

Data Transformation with Spark SQL in Databricks

Kurs anzeigen

Interaktive Übung

In dieser interaktiven Übung kannst du die Theorie in die Praxis umsetzen.

Übung starten