Creating RDDs
In PySpark, you can create an RDD (Resilient Distributed Dataset) in a few different ways. Since you are already familiar with DataFrames, you will set this up using a DataFrame. Remember, there's already a SparkSession
called spark
in your workspace!
Cet exercice fait partie du cours
Introduction to PySpark
Instructions
- Create a DataFrame from the provided list called
df
. - Convert the DataFrame to an RDD.
- Collect and print the resulting RDD.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Create a DataFrame
df = spark.____("salaries.csv", header=True, inferSchema=True)
# Convert DataFrame to RDD
rdd = df.____
# Show the RDD's contents
rdd.____
print(rdd)