Bringing it all together II
Create a DataFrame, apply transformations, cache it, and check if it’s cached. Then, uncache it to release memory.
For this exercise a spark
session has been made for you! Look carefully at the outcome of the .explain()
method to understand what the outcome is!
This exercise is part of the course
Introduction to PySpark
Exercise instructions
- Cache the
df
DataFrame. - Explain the processing of the
agg_result
DataFrame. - Unpersist the cached
df
DataFrame after processing.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Cache the DataFrame
df.____
# Perform aggregation
agg_result = df.groupBy("Department").sum("Salary")
agg_result.show()
# Analyze the execution plan
agg_result.____
# Uncache the DataFrame
df.____