1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to PySpark

Connected

Exercise

Reading a CSV and performing aggregations

You have a spreadsheet of Data Scientist salaries from companies ranging is size from small to large. You want to see if there is a major difference between average salaries grouped by company size.

Remember, there's already a SparkSession called spark in your workspace!

Instructions

100 XP
  • Load a csv file as a DataFrame and infer the schema.
  • Return the count of the number of rows.
  • Group by the column company_size and calculate the average salary with salary_in_usd.