1. Learn
  2. /
  3. Courses
  4. /
  5. ETL and ELT in Python

Connected

Exercise

Validating a data pipeline at "checkpoints"

In this exercise, you'll be working with a data pipeline that extracts tax data from a CSV file, creates a new column, filters out rows based on average taxable income, and persists the data to a parquet file.

pandas has been loaded as pd, and the extract(), transform(), and load() functions have already been defined. You'll use these functions to validate the data pipeline at various checkpoints throughout its execution.

Instructions 1/3

undefined XP
    1
    2
    3
  • Print the shape of the raw_tax_data and clean_tax_data DataFrames and observe the difference in dimensions.