IniziaInizia gratis

Aggregating II

To get you familiar with more of the built in aggregation methods, here's a few more exercises involving the flights table!

Remember, a SparkSession called spark is already in your workspace, along with the Spark DataFrame flights.

Questo esercizio fa parte del corso

Foundations of PySpark

Visualizza il corso

Istruzioni dell'esercizio

  • Use the .avg() method to get the average air time of Delta Airlines flights (where the carrier column has the value "DL") that left SEA. The place of departure is stored in the column origin. show() the result.
  • Use the .sum() method to get the total number of hours all planes in this dataset spent in the air by creating a column called duration_hrs from the column air_time. show() the result.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Average duration of Delta flights
flights.filter(____.____ == "____").filter(____.____ == "____").groupBy().avg("____").show()

# Total hours in the air
flights.withColumn("____", flights.air_time/60).groupBy().sum("____").show()
Modifica ed esegui il codice