MulaiMulai sekarang secara gratis

RDDs from Parallelized collections

Resilient Distributed Dataset (RDD) is the basic abstraction in Spark. It is an immutable distributed collection of objects. Since RDD is a fundamental and backbone data type in Spark, it is important that you understand how to create it. In this exercise, you'll create your first RDD in PySpark from a collection of words.

Remember, you already have a SparkContext sc available in your workspace.

Latihan ini adalah bagian dari kursus

Big Data Fundamentals with PySpark

Lihat Kursus

Petunjuk latihan

  • Create a RDD named RDD from a Python list of words.
  • Confirm the object created is RDD.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Create an RDD from a list of words
RDD = sc.____(["Spark", "is", "a", "framework", "for", "Big Data processing"])

# Print out the type of the created object
print("The type of RDD is", ____(RDD))
Edit dan Jalankan Kode