1. Learn
  2. /
  3. Courses
  4. /
  5. ETL and ELT in Python

Connected

Exercise

Testing a data pipeline end-to-end

In this exercise, you'll be working with the same data pipeline as before, which extracts, transforms, and loads tax data. You'll practice testing this pipeline end-to-end to ensure the solution can be run multiple times, without duplicating the transformed data in the parquet file.

pandas has been loaded as pd, and the extract(), transform(), and load() functions have already been defined.

Instructions

100 XP
  • Run the ETL pipeline three times, using a for-loop.
  • Print the shape of the clean_tax_data in each iteration of the pipeline run.
  • Read the DataFrame stored in the "clean_tax_data.parquet" file into the to_validate variable.
  • Output the shape of the to_validate DataFrame, comparing it to the shape of clean_tax_rate to ensure data wasn't duplicated upon each pipeline run.