Aggregating in PySpark
Now you're ready to do some aggregating of your own!
You're going to use a salary dataset that you have already used. Let's see what aggregations you can create!
A SparkSession called spark
is already in your workspace, along with the Spark DataFrame salaries_df
.
Cet exercice fait partie du cours
Introduction to PySpark
Instructions
- Find the minimum salary at a US, Small company - performing the filtering by referencing the column directly (
"salary_in_usd"
), not passing a SQL string. - Find the maximum salary at a US, Large company, denoted by a
"L"
- performing the filtering by referencing the column directly ("salary_in_usd"
), not passing a SQL string.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Find the minimum salaries for small companies
salaries_df.filter(salaries_df.company_size == "S").groupBy().____.show()
# Find the maximum salaries for large companies
salaries_df.filter(salaries_df.company_size ____).____().max("salary_in_usd").show()