1. Learn
  2. /
  3. Courses
  4. /
  5. Big Data Fundamentals with PySpark

Connected

Exercise

RDDs from Parallelized collections

Resilient Distributed Dataset (RDD) is the basic abstraction in Spark. It is an immutable distributed collection of objects. Since RDD is a fundamental and backbone data type in Spark, it is important that you understand how to create it. In this exercise, you'll create your first RDD in PySpark from a collection of words.

Remember, you already have a SparkContext sc available in your workspace.

Instructions

100 XP
  • Create a RDD named RDD from a Python list of words.
  • Confirm the object created is RDD.