Validating a data pipeline with assert
To build unit tests for data pipelines, it's important to get familiar with the assert
keyword, and the isinstance()
function. In this exercise, you'll practice using these two tools to validate components of a data pipeline.
The functions extract()
and transform()
have been made available for you, along with pandas
, which has been imported as pd
. Both extract()
and transform()
return a DataFrame. Good luck!
This exercise is part of the course
ETL and ELT in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
raw_tax_data = extract("raw_tax_data.csv")
clean_tax_data = transform(raw_tax_data)
# Validate the number of columns in the DataFrame
____ len(clean_tax_data.columns) == ____