Get startedGet started for free

When to use a broadcast join

Now that you know shuffle is the bottleneck, your team at Global Retail Analytics wants to fix a slow query. It joins a sales_transactions table (50 million rows) with a product_categories table (2,000 rows) using a standard left join, and it runs for over ten minutes. You suggest switching to a broadcast join using F.broadcast().

Why would a broadcast join speed up this query?

This exercise is part of the course

Data Transformation with Spark SQL in Databricks

View Course

Hands-on interactive exercise

Turn theory into action with one of our interactive exercises

Start Exercise