MulaiMulai sekarang secara gratis

Practicing caching: putting it all together

What was the best approach to caching df1 and df2 and why?

Your results will vary; but here is one (random) result for each of the two approaches:

First answer (cache df1):

df1_1st : 2.4s
df1_2nd : 0.1s
df2_1st : 0.3s
df2_2nd : 0.2s
Overall elapsed : 3.9

Second answer (cache df2):

df1_1st : 2.3s
df1_2nd : 1.1s
df2_1st : 1.7s
df2_2nd : 0.1s
Overall elapsed : 6.4

Latihan ini adalah bagian dari kursus

Introduction to Spark SQL in Python

Lihat Kursus

Latihan interaktif praktis

Ubah teori menjadi tindakan dengan salah satu latihan interaktif kami.

Mulai berolahraga