Testing Airflow code
1. Testing Airflow code
We wouldn't deploy application code without tests. Our Dags deserve the same treatment.2. Three levels of testing
Airflow code benefits from three levels of testing, shown here as a pyramid. At the base, integrity tests check whether our Dags even load without errors. These are fast and cheap, so they form the foundation. In the middle, unit tests verify that the business logic produces correct results. At the top, integration tests run the full Dag end-to-end. Each level catches different kinds of bugs, and together they give us confidence to deploy changes without breaking production.3. Integrity tests with DagBag
We import DagBag from airflow.models and create an instance with include_examples=False, so it only loads our Dags. Then we write two test functions using assert, which checks a condition and raises an error if it's false. The first function checks dag_bag.import_errors, a dictionary that maps file paths to error messages. If it's empty, all Dag files are loaded without issues. The second function checks dag_bag.dags, a dictionary that maps dag_id strings to Dag objects. We verify that daily_etl is present, confirming the Dag we expect is actually loaded and available.4. Why Dags break on import
Import errors are the most common cause of Dag failures in production. The image shows what this looks like in the Airflow UI: the Dag simply disappears from the list with no warning. A ModuleNotFoundError means a provider package is missing, or we used the wrong import path. A NameError happens when we reference a variable or function that was renamed or doesn't exist. And an ImportError from circular imports prevents the entire module from loading. Without an integrity test, we only find out when the scheduler silently drops the Dag. With one, we catch these automatically, for example, in a continuous integration pipeline that runs tests every time we push code.5. Unit testing task functions
Unit tests target the business logic. The key insight is that a @task function is just a Python function with a decorator, so the easiest way to test the logic is to extract it into a standalone function. In the Dag file, clean_record function takes a record dictionary, strips whitespace from the name, and converts the email to lowercase. The @task function transform calls clean_record for each record. Then, the test file imports clean_record directly and calls it with test input. We assert that the name comes back without whitespace. There is no Airflow runtime and no database needed, just a fast Python assertion that runs in milliseconds.6. Integration tests with dag.test()
For integration tests, we import pytest for test discovery and assertions, along with datetime from pendulum. Airflow uses a pendulum for all date and time handling, so we use it here to set the logical date. We load the Dag from DagBag using get_dag and verify that it exists. Then we call dag.test() with a specific logical_date, which runs all tasks sequentially in a single process. After the run, we verify the outputs: check that the file was created and that it contains the expected results. That gives us an end-to-end test running the actual Dag code.7. Testing in CI
In practice, we run all three levels automatically whenever we push code to our repository, using a continuous integration tool such as GitHub Actions or GitLab CI. As the diagram on the left shows, the pipeline runs lint checks first, then integrity and unit tests, then integration tests, and finally deploys. Integrity and unit tests are fast, so they run on every commit. Integration tests are slower since they execute the full Dag, so teams often run them on pull requests or on a nightly schedule. The important thing is that no Dag reaches production without passing all three levels.8. Let's practice!
Let's write some tests.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.