Get startedGet started for free

Aggregating in PySpark

Now you're ready to do some aggregating of your own! You're going to use a salary dataset that you have already used. Let's see what aggregations you can create! A SparkSession called spark is already in your workspace, along with the Spark DataFrame salaries_df.

This exercise is part of the course

Introduction to PySpark

View Course

Exercise instructions

  • Find the minimum salary at a US, Small company - performing the filtering by referencing the column directly ("salary_in_usd"), not passing a SQL string.
  • Find the maximum salary at a US, Large company, denoted by a "L" - performing the filtering by referencing the column directly ("salary_in_usd"), not passing a SQL string.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Find the minimum salaries for small companies
salaries_df.filter(salaries_df.company_size == "S").groupBy().____.show()

# Find the maximum salaries for large companies
salaries_df.filter(salaries_df.company_size ____).____().max("salary_in_usd").show()
Edit and Run Code