Practicing caching: putting it all together
What was the best approach to caching df1 and df2 and why?
Your results will vary; but here is one (random) result for each of the two approaches:
First answer (cache df1):
df1_1st : 2.4s
df1_2nd : 0.1s
df2_1st : 0.3s
df2_2nd : 0.2s
Overall elapsed : 3.9
Second answer (cache df2):
df1_1st : 2.3s
df1_2nd : 1.1s
df2_1st : 1.7s
df2_2nd : 0.1s
Overall elapsed : 6.4
Este exercício faz parte do curso
Introduction to Spark SQL in Python
Exercício interativo prático
Transforme a teoria em ação com um de nossos exercícios interativos
Começar o exercício