1. Airflow executors
Now that we've implemented some Airflow sensors, let's discuss the execution model within Airflow.
2. What is an executor?
In Airflow, an executor is the component that actually runs the tasks defined within your workflows.
Each executor has different capabilities and behaviors for running the set of tasks. Some may run a single task at a time on a local system, while others might split individual tasks among all the systems in a cluster. This is often referred to as the number of worker slots available.
We'll discuss some of these in more detail soon, but a few examples of executors are the SequentialExecutor, the LocalExecutor, and the KubernetesExecutor. This is not an exhaustive list, and you can also create your own executor if required (though we won't cover that in this course).
3. SequentialExecutor
The SequentialExecutor is the default execution engine for Airflow.
It runs only a single task at a time. This means having multiple workflows scheduled around the same timeframe may cause things to take longer than expected.
The SequentialExecutor is useful for debugging as it's fairly simple to follow the flow of tasks and it can also be used with some integrated development environments (though we won't cover that here).
The most important aspect of the SequentialExecutor is that while it's very functional for learning and testing, it's not really recommended for production due to the limitations of task resources.
4. LocalExecutor
The LocalExecutor is another option for Airflow that runs entirely on a single system.
It basically treats each task as a process on the local system, and is able to start as many concurrent tasks as desired / requested / and permitted by the system resources (ie, CPU cores, memory, etc).
This concurrency is the parallelism of the system, and it is defined by the user in one of two ways - either unlimited, or limited to a certain number of simultaneous tasks.
Defined intelligently, the LocalExecutor is a good choice for a single production Airflow system and can utilize all the resources of a given host system.
5. KubernetesExecutor
The last executor we'll look at is the Kubernetes executor. If you're not familiar with Kubernetes, it's a container orchestration system that allows tasks to be run on a cluster of machines.
Using a KubernetesExecutor, multiple Airflow systems can be configured as workers for a given set of workflows / tasks. You can add extra systems at any time to better balance workflows.
The power of the KubernetesExecutor is significantly more difficult to setup and configure. It requires a working Kubernetes configuration prior to configuring Airflow, not to mention some method to share DAGs between the systems (ie, a git server, Network File System, etc).
While it is more difficult to configure, the KubernetesExecutor is a powerful choice for anyone working with a large number of DAGs and / or expects their processing needs to grow.
6. Determine your executor
Sometimes when developing Airflow workflows, you may want to know the executor being used. If you have access to the command line, you can determine this by:
Looking at the appropriate airflow dot cfg file. Search for the executor equal line, and it will specify the executor in use.
Note that we haven't discussed the airflow.cfg file in depth as we assume a configured Airflow instance in this course. The airflow.cfg file is where most of the configuration and settings for the Airflow instance are defined, including the type of executor.
7. Determine your executor #2
You can also determine the executor by running airflow info from the command line. Within the first few lines, you should see an entry for which executor is in use (In this case, it's the SequentialExecutor). Make sure to scroll the output to see everything.
8. Let's practice!
We've just discussed some of the various Airflow executors - let's practice what we've learned in the exercises ahead.