CommencerCommencer gratuitement

When to use a broadcast join

Now that you know shuffle is the bottleneck, your team at Global Retail Analytics wants to fix a slow query. It joins a sales_transactions table (50 million rows) with a product_categories table (2,000 rows) using a standard left join, and it runs for over ten minutes. You suggest switching to a broadcast join using F.broadcast().

Why would a broadcast join speed up this query?

Cet exercice fait partie du cours

Data Transformation with Spark SQL in Databricks

Afficher le cours

Exercice interactif pratique

Passez de la théorie à la pratique avec l’un de nos exercices interactifs

Commencer l’exercice