ComenzarEmpieza gratis

Practicing caching: putting it all together

What was the best approach to caching df1 and df2 and why?

Your results will vary; but here is one (random) result for each of the two approaches:

First answer (cache df1):

df1_1st : 2.4s
df1_2nd : 0.1s
df2_1st : 0.3s
df2_2nd : 0.2s
Overall elapsed : 3.9

Second answer (cache df2):

df1_1st : 2.3s
df1_2nd : 1.1s
df2_1st : 1.7s
df2_2nd : 0.1s
Overall elapsed : 6.4

Este ejercicio forma parte del curso

Introduction to Spark SQL in Python

Ver curso

Ejercicio interactivo práctico

Pon en práctica la teoría con uno de nuestros ejercicios interactivos

Empieza el ejercicio