Get startedGet started for free

Airflow operators

1. Airflow operators

Welcome back! We've discussed the basics of tasks in Airflow without covering exactly what a task consists of. The most common task in Airflow is the Operator. Let's take a look at Operators now.

2. Operators

Airflow operators represent a single task in a workflow. This can be any type of task from running a command, sending an email, running a Python script, and so on. Typically Airflow operators run independently - meaning that all resources needed to complete the task are contained within the operator. Generally, Airflow operators do not share information between each other. This is to simplify workflows and allow Airflow to run the tasks in the most efficient manner. It is possible to share information between operators, but the details of how are beyond this course. Airflow contains many various operators to perform different tasks. For example, the EmptyOperator can be used to represent a task for troubleshooting or a task that has not yet been implemented. We are focusing on the BashOperator for this lesson but will look at the PythonOperator and several others later on.

3. BashOperator

The BashOperator executes a given Bash command or script. This command can be pretty much anything Bash is capable of that would make sense in a given workflow. The BashOperator requires two arguments: the task id which is the name that shows up in the UI and the bash command (the raw command or script). Note that this would appear inside of a with DAG code block. If using older DAG definition methods, it would require a third argument, dag, which defines what dag it belongs to. An example is given for illustration. The BashOperator runs the command in a temporary directory that gets automatically cleaned up afterwards. It is possible to specify environment variables for the bash command to try to replicate running the task as you would on a local system. If you're unfamiliar with environment variables, these are run-time settings interpreted by the shell. It provides flexibility while running scripts in a generalized way. The first example runs a bash command to echo Example exclamation mark to standard out. The second example uses a predefined bash script for its command, runcleanup.sh.

4. BashOperator examples

Before using the BashOperator, it must be imported from airflow dot operators dot bash. The first example creates a BashOperator that takes a task_id and runs the bash command "echo 1". The second example is a BashOperator to run a quick data cleaning operation using cat and awk. Don't worry if you don't understand exactly what this is doing. This is a common scenario when running workflows - you may not know exactly what a command does, but you can still run it in a reliable way.

5. Operator gotchas

There are some general gotchas when using Operators. The biggest is that individual operators are not guaranteed to run in the same location or environment. This means that just because one operator ran in a given directory with a certain setup, it does not necessarily mean that the next operator will have access to that same information. If this is required, you must explicitly set it up. You may need to set up environment variables, especially for the BashOperator. For example, it's common in bash to use the tilde character to represent a home directory. This is not defined by default in Airflow. Another example of an environment variable could be AWS credentials, database connectivity details, or other information specific to running a script. Finally, it can also be tricky to run tasks with any form of elevated privilege. This means that any access to resources must be setup for the specific user running the tasks. If you're uncertain what elevated privileges are, think of running a command as root or the administrator on a system.

6. Let's practice!

We've discussed the basics of Airflow operators - Let's practice using them in some workflows now.