1. Learn
  2. /
  3. Courses
  4. /
  5. Big Data Fundamentals with PySpark

Exercise

Partitions in your data

SparkContext's textFile() method takes an optional second argument called minPartitions for specifying the minimum number of partitions. In this exercise, you'll create a RDD named fileRDD_part with 5 partitions and then compare that with fileRDD that you created in the previous exercise. Refer to the "Understanding Partition" slide in video 2.1 to know the methods for creating and getting the number of partitions in a RDD.

Remember, you already have a SparkContext sc, file_path and fileRDD available in your workspace.

Instructions

100 XP
  • Find the number of partitions that support fileRDD RDD.
  • Create an RDD named fileRDD_part from the file path but create 5 partitions.
  • Confirm the number of partitions in the new fileRDD_part RDD.