Airflow operators
1. Airflow operators
Welcome back! Now that we've covered Airflow task basics, let's dive into Operators — the most common type of task in Airflow.2. Operators
Airflow operators represent a single task in a workflow. This can be any type of task like running Python code, a bash command, or sending an email. Airflow operators typically run independently, meaning that all resources needed to complete the task are contained within the operator. Generally, Airflow operators do not share information between each other. This is to simplify workflows and allow Airflow to run the tasks in the most efficient manner. It is possible to share information between operators, which we will cover later on. Airflow contains many operators to perform different tasks. In this lesson, we will focus on the PythonOperator and BashOperator.3. @task (PythonOperator)
Our primary operator is the PythonOperator, which executes a Python function within a Dag. This operator is created on any given Python function that has a @task decorator applied to it. Note that while it is possible to pass data between tasks, we will cover how to do this in a later lesson.4. @task arguments
You can pass arguments to a task, just like a normal Python function. In this case, the printme function with name=DataCamp shows up in the Airflow logs.5. @task.bash (BashOperator)
The BashOperator executes a given Bash command or script, using the @task.bash decorator. This command can be pretty much anything Bash is capable of that would make sense in a given workflow. The BashOperator runs the command in a temporary directory that gets automatically cleaned up afterwards. It is possible to specify environment variables for the bash command to try to replicate running the task as you would on a local system. If you're unfamiliar with environment variables, these are run-time settings interpreted by the shell. Using environment variables provides flexibility while running scripts in a generalized way. The first example runs the bash echo to print the text "Example!". The second example uses a predefined bash script for its command, runcleanup.sh.6. Task dependencies
Once we've defined our workflows, each Dag will have a set of tasks that must be completed. We use task dependencies to specify the order tasks should run. We can use different methods to specify dependencies. We'll cover one now, and cover another later in this chapter.7. Bitshift syntax
The most common dependency method is using the bitshift syntax, in this case >> or <<. It is considerably more common to use just the >> syntax, so we'll focus on that for these examples. Our first example shows task1 >> task2, which means task1 must complete before task2 runs. If task1 fails, task2 will never run. We can also chain dependencies, in this case task1 then task2 then task3. Task1 completes before task2 is run, and task2 completes before task3. Our last example shows that task3 is dependent on task1 and task2, but in a different way. In this case, task1 and task2 can run simultaneously if Airflow decides, and, when complete, task3 will run.8. Bitshift syntax example
Consider a Dag that reconciles the sales and inventory data. Both must be completed before reconciling, but the order of completion isn't specified. We can have each download task run simultaneously, but all tasks must finish before reconciling.9. Let's practice!
We've discussed the basics of Airflow operators. Let's practice using them in some workflows.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.