Airflow tasks
1. Airflow tasks
Welcome back! Now that you've worked with operators a bit, let's take a look at the concept of tasks within Airflow.2. Tasks
Within Airflow, tasks are instantiated operators. It basically is a shortcut to refer to a given operator within a workflow. Tasks are usually assigned to a variable within Python code. Using a previous example, we assign the BashOperator to the variable example underscore task. Note that within the Airflow tools, this task is referred by its task id, not the variable name.3. Task dependencies
Task dependencies in Airflow define an order of task completion. While not required, task dependencies are usually present. If task dependencies are not defined, task execution is handled by Airflow itself with no guarantees of order. Task dependencies are referred to as upstream or downstream tasks. An upstream task means that it must complete prior to any downstream tasks. Since Airflow 1.8, task dependencies are defined using the bitshift operators. The upstream operator is two greater-than symbols. The downstream operator is two less-than symbols.4. Upstream vs Downstream
It's easy to get confused on when to use an upstream or downstream operator. The simplest analogy is that upstream means before and downstream means after. This means that any upstream tasks would need to complete prior to any downstream ones.5. Simple task dependency
Let's look at a simple example involving two bash operators. We define our first task, and assign it to the variable task1. We then create our second task and assign it to the variable task2. Once each operator is defined and assigned to a variable, we can define the task order using the bitshift operators. In this case, we want to run task1 before task2. The most readable method for this is using the upstream operator, two greater-than symbols, as task1 upstream operator task2. Note that you could also define this in reverse using the downstream operator to accomplish the same thing. In this case, it'd be task2 two less-than symbols task1.6. Task dependencies in the Airflow UI
Let's take a look at what the Airflow UI shows for tasks and their dependencies. In this case, we're looking at the graph view within the Airflow web interface.7. Task dependencies in the Airflow UI
Note that in the task area, our two tasks, first_task and second_task, are both present, but there is no order to the task execution. This is the DAG prior to setting the task dependency using the bitshift operator.8. Task dependencies in the Airflow UI
Now let's look again at the view with a defined order via the bitshift operators. The view is similar but we can see the order of tasks indicated by the connection between first underscore task and second underscore task.9. Multiple dependencies
Dependencies can be as complex as required to define the workflow to your needs. We can chain a dependency, in this case setting task1 upstream of task2 upstream of task3 upstream of task4. The Airflow graph view shows a dependency view indicating this order. You can also mix upstream and downstream bitshift operators in the same workflow. If we define task1 upstream of task2 then downstream of task3, we get a configuration different than what we might expect. This creates a DAG where first underscore task and third underscore task must finish prior to second underscore task. This means we could define the same dependency graph on two lines, in a possibly clearer form. task1 upstream of task2. task3 upstream of task2. Note that because we don't require it, either task1 or task3 could run first depending on Airflow's scheduling.10. Let's practice!
There are many intricacies to defining tasks and using the bitshift operators. The best way to solidify these is practice in the exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.