Complex Aggregations
To get you familiar with more of the built in aggregation methods, let's do a slightly more complex aggregation! The goal is to merge all these commands into a single line.
Remember, a SparkSession called spark
is already in your workspace, along the Spark DataFrame salaries_df
.
Cet exercice fait partie du cours
Introduction to PySpark
Instructions
- Calculate the average salaries of large US companies using the
"salary_in_usd"
column. - Calculate the total salaries of large US companies.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Average salaries at large us companies
large_companies=salaries_df.filter(salaries_df.company_size == "L").filter(salaries_df.company_location == "US").groupBy().____
#set a large companies variable for other analytics
large_companies=salaries_df.filter(salaries_df.company_size == "L").filter(salaries_df.company_location == "US")
# Total salaries in usd
large_companies.groupBy().____.show()