1. 학습
  2. /
  3. 강의
  4. /
  5. Introduction to PySpark

Connected

연습 문제

Reading a CSV and performing aggregations

You have a spreadsheet of Data Scientist salaries from companies ranging is size from small to large. You want to see if there is a major difference between average salaries grouped by company size.

Remember, there's already a SparkSession called spark in your workspace!

지침

100 XP
  • Load a csv file as a DataFrame and infer the schema.
  • Return the count of the number of rows.
  • Group by the column company_size and calculate the average salary with salary_in_usd.