1. เรียนรู้
  2. /
  3. Courses
  4. /
  5. Big Data Fundamentals with PySpark

Connected

Exercises

RDDs from Parallelized collections

Resilient Distributed Dataset (RDD) is the basic abstraction in Spark. It is an immutable distributed collection of objects. Since RDD is a fundamental and backbone data type in Spark, it is important that you understand how to create it. In this exercise, you'll create your first RDD in PySpark from a collection of words.

Remember, you already have a SparkContext sc available in your workspace.

คำแนะนำ

100 XP
  • Create a RDD named RDD from a Python list of words.
  • Confirm the object created is RDD.