Writing unit tests with pytest
In this exercise, you'll practice writing a unit test to validate a data pipeline. You'll use assert
and other tools to build the tests, and determine if the data pipeline performs as it should.
The functions extract()
and transform()
have been made available for you, along with pandas
, which has been imported as pd
. You'll be testing the transform()
function, which is shown below.
def transform(raw_data):
raw_data["average_taxable_income"] = raw_data["total_taxable_income"] / raw_data["number_of_firms"]
clean_data = raw_data.loc[raw_data["average_taxable_income"] > 100, :]
clean_data.set_index("industry_name", inplace=True)
return clean_data
This exercise is part of the course
ETL and ELT in Python
Exercise instructions
- Import the
pytest
library. - Assert that the value stored in the
clean_tax_data
variables is an instance of apd.DataFrame
. - Validate that the number of columns in the
clean_tax_data
DataFrame is greater than the columns stored in theraw_tax_data
DataFrame.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
import ____
def test_transformed_data():
raw_tax_data = extract("raw_tax_data.csv")
clean_tax_data = transform(raw_tax_data)
# Assert that the transform function returns a pd.DataFrame
assert ____(clean_tax_data, pd.DataFrame)
# Assert that the clean_tax_data DataFrame has more columns than the raw_tax_data DataFrame
____ len(clean_tax_data.columns) ____ len(raw_tax_data.columns)