1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to PySpark

Connected

Exercise

Aggregating in RDDs

Now that you have conducted analytics with DataFrames in PySpark, let's briefly do a similar task with an RDD. Using the provided code, get the sum of the values of an RDD in PySpark.

A Spark session called spark has already been made for you.

Instructions

100 XP
  • Create an RDD from the provided DataFrame.
  • Apply the provided Lambda Function to the keys of the RDD.
  • Collect the results of the aggregation.