Sharing data between tasks
1. Sharing data between tasks
You already know that Airflow tasks can share data through XCom. Now let's look at how the TaskFlow API simplifies that process, and what to do when basic XCom reaches its limits.2. What is XCom?
Let's start with a quick recap. XCom, short for cross-communication, is how Airflow passes data between tasks. When a task produces a value, Airflow stores it in the metadata database. A downstream task can then pull that value and use it. With classic operators, you call xcom_push and xcom_pull explicitly. That works, but the TaskFlow API gives you a much simpler way.3. XComArgs: the TaskFlow way
In the TaskFlow API, when a @task function returns a value, Airflow wraps it in an XComArg. Think of it as a lazy reference: it doesn't hold the actual data, just a pointer to where the data will be stored. When you pass that reference into another @task function, Airflow creates the dependency and resolves the value at runtime.4. XComArgs in code
Here is what that looks like in code. The get_config function returns a dictionary. That return value is an XComArg. When we pass it into run_pipeline, Airflow knows that run_pipeline depends on get_config, and at runtime, it pulls the actual dictionary from XCom. You get dependency wiring and data passing in a single line.5. Mixing classic and TaskFlow
The same pattern works when mixing classic operators with TaskFlow tasks. Every operator instance has an .output attribute that returns an XComArg. So you can pass a BashOperator's output directly into a @task function. Airflow creates the dependency and resolves the value, just like between two TaskFlow tasks.6. XCom limitations
XCom has limits. The maximum value size depends on your database backend: PostgreSQL allows up to 1 GB, SQLite up to 2 GB, but MySQL only 64 KB. The data must also be JSON serializable, meaning it works with dictionaries, lists, strings, and numbers, but not objects like database connections or file handles, with exceptions like Pandas dataframes. And every XCom value adds load to your metadata database. For small configuration values or file paths, XCom works great. For large datasets, it does not.7. Custom XCom backends
That is where custom XCom backends come in. Instead of storing values in the metadata database, you can route them to object storage like S3, GCS, or Azure Blob Storage. The common-io provider even lets you set a size threshold: small values stay in the database, large ones go to object storage automatically. This keeps your metadata database lightweight while letting tasks share larger datasets.8. Let's practice!
Time to pass some data between tasks.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.