Get startedGet started for free

Aggregating

All of the common aggregation methods, like .min(), .max(), and .count() are GroupedData methods. These are created by calling the .groupBy() DataFrame method. You'll learn exactly what that means in a few exercises. For now, all you have to do to use these functions is call that method on your DataFrame. For example, to find the minimum value of a column, col, in a DataFrame, df, you could do

df.groupBy().min("col").show()

This creates a GroupedData object (so you can use the .min() method), then finds the minimum value in col, and returns it as a DataFrame.

Now you're ready to do some aggregating of your own!

A SparkSession called spark is already in your workspace, along with the Spark DataFrame flights.

This exercise is part of the course

Foundations of PySpark

View Course

Exercise instructions

  • Find the length of the shortest (in terms of distance) flight that left PDX by first .filter()ing and using the .min() method. Perform the filtering by referencing the column directly, not passing a SQL string.
  • Find the length of the longest (in terms of time) flight that left SEA by filter()ing and using the .max() method. Perform the filtering by referencing the column directly, not passing a SQL string.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Find the shortest flight from PDX in terms of distance
flights.filter(____.____ == ____).groupBy().____(____).show()

# Find the longest flight from SEA in terms of air time
flights.filter(____).groupBy().____.show()
Edit and Run Code