Get startedGet started for free

Branching

1. Branching

Nice work so far. We're nearing the end of this Airflow introduction. Let's take a look at another important concept, branching.

2. Branching

Branching provides the ability for conditional logic of if / then choices within Airflow. This means that tasks can be selectively executed or skipped depending on the return value of the task. We'll use the @task.branch decorator to define a branched workflow. The @task.branch works by running a python function which returns the name or names of the task_ids to run. This is best seen with an example.

3. Branching example

Let's consider a scenario where we need to run a special task if it's the end of a sales quarter, typically March, June, September, and December. We've already defined our Dag and imported all the necessary libraries. Our first job is to create the function used with the @task.branch. In the function we first access the logical_date entry. We can access the month with logical_date.month. We convert this value to an integer, and then run a check if modulus 3 equals 0. Here, we're checking if a number is fully divisible by 3. If so, it's the end of quarter, otherwise, it's a regular month. As such, we return either end_of_quarter_task, or regular_monthly_task. One quick note for logical_date. Airflow automatically maps variables from the context object that we need in our functions. We could replace this with any of the other values we've worked with, including logical_date, ds_nodash, params, or even var.

4. Branching example

We don't show the code here, but assume we've created two tasks for end_of_quarter, and two tasks for regular months. We need to set the dependencies using the bitshift syntax. First, we configure the dependency order for start_task, branch_task, then end_of_quarter and end_of_quarter_task2. Next, we set the dependency order for the regular_monthly tasks. We set regular_monthly_task to follow the branch_task, and the regular_monthly_task2 to follow that. You may be wondering why dependencies are necessary if one set is not going to run. Without these task dependencies, all the tasks would execute normally, regardless of what the branch operator returned.

5. Branching graph view

Let's look at the Dag in the graph view of the Airflow UI. You'll notice that we have a start_task upstream of the branch_task. The branch_task then shows two paths, one to the end_of_quarter tasks, and the other to the regular_monthly tasks.

6. Branching End of quarter months

Let's look first at what happens if we run on an end_of_quarter month. The start_task executes as normal, then the branch_task checks the logical_date.month value and determines this is an end_of_quarter month. It returns the value end_of_quarter_task, which is then executed by Airflow followed by the end_of_quarter_task2. Note that the regular_monthly tasks are marked skipped.

7. Branching Regular months

For completeness, let's look at the output from a run on a regular month. The process is the same, except that the branch task selects regular_monthly_task instead and the end_of_quarter branch is marked skipped.

8. Date variables

We've discussed variables before, but we should review a couple commonly used in Airflow Dags for branching operations based on date. ds is the Logical date a Dag run started with dashes. ds_nodash is the same without dashes. The previous run is available as prev_data_interval_start_success. It's a mouthful, but useful. There are many others available in the Airflow documentation.

9. Let's practice!

Let's practice working with branches now.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.