BaşlayınÜcretsiz Başlayın

Writing unit tests with pytest

In this exercise, you'll practice writing a unit test to validate a data pipeline. You'll use assert and other tools to build the tests, and determine if the data pipeline performs as it should.

The functions extract() and transform() have been made available for you, along with pandas, which has been imported as pd. You'll be testing the transform() function, which is shown below.

def transform(raw_data):
    raw_data["average_taxable_income"] = raw_data["total_taxable_income"] / raw_data["number_of_firms"]
    clean_data = raw_data.loc[raw_data["average_taxable_income"] > 100, :]
    clean_data.set_index("industry_name", inplace=True)
    return clean_data

Bu egzersiz

ETL and ELT in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Import the pytest library.
  • Assert that the value stored in the clean_tax_data variables is an instance of a pd.DataFrame.
  • Validate that the number of columns in the clean_tax_data DataFrame is greater than the columns stored in the raw_tax_data DataFrame.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

import ____

def test_transformed_data():
    raw_tax_data = extract("raw_tax_data.csv")
    clean_tax_data = transform(raw_tax_data)
    
    # Assert that the transform function returns a pd.DataFrame
    assert ____(clean_tax_data, pd.DataFrame)
    
    # Assert that the clean_tax_data DataFrame has more columns than the raw_tax_data DataFrame
    ____ len(clean_tax_data.columns) ____ len(raw_tax_data.columns)
Kodu Düzenle ve Çalıştır