Aggregating II
To get you familiar with more of the built in aggregation methods, here's a few more exercises involving the flights table!
Remember, a SparkSession called spark is already in your workspace, along with the Spark DataFrame flights.
Bu egzersiz
Foundations of PySpark
kursunun bir parçasıdırEgzersiz talimatları
- Use the
.avg()method to get the average air time of Delta Airlines flights (where thecarriercolumn has the value"DL") that left SEA. The place of departure is stored in the columnorigin.show()the result. - Use the
.sum()method to get the total number of hours all planes in this dataset spent in the air by creating a column calledduration_hrsfrom the columnair_time.show()the result.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Average duration of Delta flights
flights.filter(____.____ == "____").filter(____.____ == "____").groupBy().avg("____").show()
# Total hours in the air
flights.withColumn("____", flights.air_time/60).groupBy().sum("____").show()