Dynamic Task Mapping

1. Dynamic Task Mapping

So far, every task has been predefined. But what happens when we don't know how many tasks we need until the Dag runs?

2. The problem with hardcoded tasks

Imagine we have a pipeline that processes files from a data lake, which is a centralized store where raw data lands before processing. Today, there are three files, so we write three tasks. Tomorrow there might be ten, and next week there could be twenty. Every time the data changes, we have to update the code. Hardcoding a task for each file doesn't scale. We need tasks that adapt at runtime.

3. .expand()

That's where dynamic task mapping comes in. Let's walk through the code. First, we define a task called fetch_files that returns a list of file paths. This list could come from an API, a database query, or a directory listing, so the count is not known until the Dag runs. Next, we define process_file, the task that handles one file. Notice the max_active_tis_per_dagrun parameter set to 2. At most two mapped instances run at the same time. With three files, two process in parallel first, and the third waits for a slot. This is useful when we need to limit concurrency, for example, to avoid overwhelming an API or a database. Finally, we call process_file.expand with the files list. Airflow creates one task instance per item at runtime. The number of instances is determined by the data, not by the code.

4. Mapped tasks in the UI

In the Airflow UI, mapped task instances appear as indexed entries under a single task. Each instance gets its own index number, and we can click into any one to see its logs, status, and execution details. This keeps the Dag readable even when it fans out into hundreds of parallel instances.

5. .partial()

Sometimes every mapped instance needs a shared parameter, like a base URL or a connection string. Here, our process_file task takes two arguments: a path that varies per instance and an output_dir that should be the same for all of them. We use partial to fix the shared output directory to "/out" for every instance, then chain expand to vary the file path. The visual below shows how this works: partial locks in the constant, expand fans out the variable. The two combined give you both a shared configuration and per-item variation.

6. The cross-product problem

What if we have two lists that need to be paired? Here we have three files and three destinations. If we pass both to expand, Airflow creates the cross product, meaning three files times three destinations, giving us nine task instances instead of three. What we actually want is for file A to go to destination A, file B to destination B, and file C to destination C.

7. .zip()

The zip function solves this. Instead of passing both lists to expand, we call files.zip with destinations. This pairs items by position: the first file with the first destination, the second with the second, and so on. That gives us three instances, not nine. We pass the zipped pairs to expand_kwargs, a variant of expand that unpacks each pair into keyword arguments. So the first instance receives path equals "/data/a.csv" and dest equals "s3://out/a", and the pattern continues for each pair.

8. When to use what

To wrap up, expand maps over a single list, creating one task instance per item. Partial fixes shared parameters so every instance gets the same value. And zip pairs multiple lists by position, avoiding the cross product. Together, they let us build Dags that scale with our data. When the number of files, endpoints, or records changes, we don't touch the code because the task count adjusts automatically at runtime.

9. Let's practice!

Time to map some tasks dynamically.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.