ComenzarEmpieza gratis

Inspecting cache in the Spark UI

A dataframe partitioned_df is available. It is used to register a temporary table called text. text is then cached using spark.catalog.cacheTable('text'). If you were running Spark locally, then the Spark UI would be available at http://localhost:4040/storage/. For the purpose of this exercise, examine the following image. It shows what the Spark UI would display once the cache for text is loaded:

Spark UI Storage

This shows that a table called text having seven partitions is cached in memory. Which of the following would immediately cause the above to appear in Spark UI?

  1. Performing a transform on the underlying dataframe, for example df = partitioned_df.distinct().

  2. Counting the underlying dataframe, for example: partitioned_df.count()

  3. Querying the table using, say: spark.sql("select count(*) from text")

  4. Querying and showing the result, say: spark.sql("select count(*) from text").show()

Este ejercicio forma parte del curso

Introduction to Spark SQL in Python

Ver curso

Ejercicio interactivo práctico

Pon en práctica la teoría con uno de nuestros ejercicios interactivos

Empieza el ejercicio