Aggregating in PySpark
Now you're ready to do some aggregating of your own!
You're going to use a salary dataset that you have already used. Let's see what aggregations you can create!
A SparkSession called spark
is already in your workspace, along with the Spark DataFrame salaries_df
.
This exercise is part of the course
Introduction to PySpark
Exercise instructions
- Find the minimum salary at a US, Small company - performing the filtering by referencing the column directly (
"salary_in_usd"
), not passing a SQL string. - Find the maximum salary at a US, Large company, denoted by a
"L"
- performing the filtering by referencing the column directly ("salary_in_usd"
), not passing a SQL string.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Find the minimum salaries for small companies
salaries_df.filter(salaries_df.company_size == "S").groupBy().____.show()
# Find the maximum salaries for large companies
salaries_df.filter(salaries_df.company_size ____).____().max("salary_in_usd").show()