Get startedGet started for free

Passing data between tasks with XCom

1. Passing data between tasks with XCom

Great job working through these topics! Let's take a look at how to pass data between our Airflow tasks with XCom.

2. What is XCom?

XCom is short for cross communication. It exists to allow tasks to talk to each other and pass data between them. XComs are stored in the Airflow metadata database, which means they should be kept fairly small. This includes things like filenames, URIs, or row counts.

3. What not to send via XCom

As XCom is designed for small amounts of data, there are many things you should not use them for, including sending large files, dataframes, databases, or large images. If you have the data stored in an accessible place, you can however send pointers to that data, like filenames or URIs, as previously mentioned.

4. Implementing XCom

There are many ways to use XCom in Airflow, with varying levels of difficulty. We're going to focus on using the TaskFlow API to implement our XComs. This is an extension of what we've already done using @tasks.

5. XCom example

Let's look at a simple XCom example. Let's consider a Dag that is designed to get and clean a small amount of data. After defining our Dag, we create a task that returns a small amount of data, like a Python dict. We then define another task that is designed to clean the data. There's two special things to note here. First, let's look at the function definition - it accepts a parameter called sourcedata. This is what we pass into the task / function. The other piece to note is the parameter multiple_outputs=True on the @task decorator. This makes it easier to use the data later on. To actually define the XCom, we must pass the data from one task (get_data) into the other (clean_data). We use the same syntax as if we were calling functions in a similar fashion, in this case, clean_data(get_data()).

6. XCom dependencies

Another thing to note is that using XCom in this way has another benefit - it implicitly defines a dependency order between the tasks. In this case, clean_data(get_data()) is conceptually similar to get_data() >> clean_data(). Effectively, clean_data can't run until the information is retrieved using get_data(). As a further example, we sometimes want to interface XCom tasks with non-XCom tasks, which can be thought of as independent from other tasks' data. One way to do this is to assign the output of the XCom tasks to a specific name, in this case result. We can then use result to refer to the XCom tasks and define task order for the alert_when_complete task.

7. Viewing XCom data

We may wish to review the data used for XComs in our Dags. A quick way to do this is to choose the Browse, XComs menu option. This will bring us to a page where we can explore the data, specified by Key, Dag, Task, and more.

8. Let's practice!

Let's practice now!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.