Aggregating II
To get you familiar with more of the built in aggregation methods, here's a few more exercises involving the flights
table!
Remember, a SparkSession
called spark
is already in your workspace, along with the Spark DataFrame flights
.
This exercise is part of the course
Foundations of PySpark
Exercise instructions
- Use the
.avg()
method to get the average air time of Delta Airlines flights (where thecarrier
column has the value"DL"
) that left SEA. The place of departure is stored in the columnorigin
.show()
the result. - Use the
.sum()
method to get the total number of hours all planes in this dataset spent in the air by creating a column calledduration_hrs
from the columnair_time
.show()
the result.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Average duration of Delta flights
flights.filter(____.____ == "____").filter(____.____ == "____").groupBy().avg("____").show()
# Total hours in the air
flights.withColumn("____", flights.air_time/60).groupBy().sum("____").show()