1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to PySpark

Connected

Exercise

Creating RDDs

In PySpark, you can create an RDD (Resilient Distributed Dataset) in a few different ways. Since you are already familiar with DataFrames, you will set this up using a DataFrame. Remember, there's already a SparkSession called spark in your workspace!

Instructions

100 XP
  • Create a DataFrame from the provided list called df.
  • Convert the DataFrame to an RDD.
  • Collect and print the resulting RDD.