Practicing caching: putting it all together
What was the best approach to caching df1
and df2
and why?
Your results will vary; but here is one (random) result for each of the two approaches:
First answer (cache df1):
df1_1st : 2.4s
df1_2nd : 0.1s
df2_1st : 0.3s
df2_2nd : 0.2s
Overall elapsed : 3.9
Second answer (cache df2):
df1_1st : 2.3s
df1_2nd : 1.1s
df2_1st : 1.7s
df2_2nd : 0.1s
Overall elapsed : 6.4
This exercise is part of the course
Introduction to Spark SQL in Python
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
