When to use a broadcast join
Now that you know shuffle is the bottleneck, your team at Global Retail Analytics wants to fix a slow query. It joins a sales_transactions table (50 million rows) with a product_categories table (2,000 rows) using a standard left join, and it runs for over ten minutes. You suggest switching to a broadcast join using F.broadcast().
Why would a broadcast join speed up this query?
Este exercício faz parte do curso
Data Transformation with Spark SQL in Databricks
Exercício interativo prático
Transforme a teoria em ação com um de nossos exercícios interativos
Começar o exercício