CountingBykeys
For many datasets, it is important to count the number of keys in a key/value dataset. For example, counting the number of countries where the product was sold or to show the most popular baby names. In this simple exercise, you'll use the Rdd
that you created earlier and count the number of unique keys in that pair RDD.
Remember, you already have a SparkContext sc
and Rdd
available in your workspace.
This exercise is part of the course
Big Data Fundamentals with PySpark
Exercise instructions
countByKey
and assign the result to a variabletotal
.- What is the type of
total
? - Iterate over the
total
and print the keys and their counts.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Count the unique keys
total = Rdd.____()
# What is the type of total?
print("The type of total is", ____(total))
# Iterate over the total and print the output
for k, v in total.___():
print("key", ____, "has", ____, "counts")