ComeçarComece de graça

Aggregating in PySpark

Now you're ready to do some aggregating of your own! You're going to use a salary dataset that you have already used. Let's see what aggregations you can create! A SparkSession called spark is already in your workspace, along with the Spark DataFrame salaries_df.

Este exercício faz parte do curso

Introduction to PySpark

Ver curso

Instruções do exercício

  • Find the minimum salary at a US, Small company - performing the filtering by referencing the column directly ("salary_in_usd"), not passing a SQL string.
  • Find the maximum salary at a US, Large company, denoted by a "L" - performing the filtering by referencing the column directly ("salary_in_usd"), not passing a SQL string.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Find the minimum salaries for small companies
salaries_df.filter(salaries_df.company_size == "S").groupBy().____.show()

# Find the maximum salaries for large companies
salaries_df.filter(salaries_df.company_size ____).____().max("salary_in_usd").show()
Editar e executar o código