IniziaInizia gratis

Partitions in your data

SparkContext's textFile() method takes an optional second argument called minPartitions for specifying the minimum number of partitions. In this exercise, you'll create a RDD named fileRDD_part with 5 partitions and then compare that with fileRDD that you created in the previous exercise. Refer to the "Understanding Partition" slide in video 2.1 to know the methods for creating and getting the number of partitions in a RDD.

Remember, you already have a SparkContext sc, file_path and fileRDD available in your workspace.

Questo esercizio fa parte del corso

Big Data Fundamentals with PySpark

Visualizza il corso

Istruzioni dell'esercizio

  • Find the number of partitions that support fileRDD RDD.
  • Create an RDD named fileRDD_part from the file path but create 5 partitions.
  • Confirm the number of partitions in the new fileRDD_part RDD.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Check the number of partitions in fileRDD
print("Number of partitions in fileRDD is", fileRDD.____)

# Create a fileRDD_part from file_path with 5 partitions
fileRDD_part = sc.textFile(____, minPartitions = ____)

# Check the number of partitions in fileRDD_part
print("Number of partitions in fileRDD_part is", fileRDD_part.____)
Modifica ed esegui il codice