1. Learn
  2. /
  3. Courses
  4. /
  5. ETL and ELT in Python

Connected

Exercise

Grouping data with pandas

The output of a data pipeline is typically a "modeled" dataset. This dataset provides data consumers easy access to information, without having to perform much manipulation. Grouping data with pandas helps to build modeled datasets,

pandas has been imported as pd, and the raw_testing_scores DataFrame contains data in the following form:

              street_address       city  math_score  reading_score  writing_score
01M539   111 Columbia Street  Manhattan       657.0          601.0          601.0
02M294      350 Grand Street  Manhattan       395.0          411.0          387.0
02M308      350 Grand Street  Manhattan       418.0          428.0          415.0

Instructions

100 XP
  • Use .loc[] to only keep the "city", "math_score", "reading_score", and "writing_score" columns.
  • Group the DataFrame by the "city" column, and find the mean of each city's math, reading, and writing scores.
  • Use the transform() function to create a grouped DataFrame.