Introduction to Apache Airflow
1. Introduction to Apache Airflow
Welcome to Introduction to Airflow! I'm Mike Metzger, a Data Engineer, and I'll be your instructor while we learn the components of Apache Airflow and why you'd want to use it. Let's get started!2. What is a workflow?
Before we can really discuss Airflow, we need to talk about workflows. A workflow is a set of steps to accomplish a given data engineering process. These can include any given task, such as downloading a file, copying data, filtering information, writing to a database, and so forth. A workflow can vary in complexity. Some workflows may only have 2 or 3 steps, while others consist of hundreds of components. The complexity of a workflow is completely dependent on the needs of the user. It's important to note that we're defining a workflow here in a general data engineering sense. As you'll see later, workflow can have specific meaning within specific tools.3. What is Airflow?
Airflow is a platform for orchestrating workflows, including creating, scheduling, and monitoring them.4. What is Airflow?
Airflow can use various tools and languages, but the actual workflow code is written with Python. Airflow implements workflows as Dags. We'll discuss exactly what this means throughout this course, but for now think of it as a set of tasks and the dependencies between them. Airflow can be accessed and controlled through a built-in web interface, code, the command-line or REST-API. We'll look at all of these options later on. We typically use Airflow for processes such as ETL pipelines, ML workflows, automation, and so forth.5. Quick introduction to Dags
A Dag is a model that represents everything needed to execute a workflow. It consists of the tasks and the dependencies between tasks. Dags are created with various details including the name, owner, email alerting options, and more.6. Airflow components
Airflow consists of several components that can be mixed and matched depending on requirements. We'll go into detail later in the course, but some of the common ones are: the scheduler that triggers scheduled workflows and submits tasks, an API Server to provide a consistent method to interact with Airflow, the Dag processor that represent the workflows to the scheduler, and the metadata database to store state information of Dags and their tasks.7. Running a workflow in Airflow UI
We'll go over more of the Airflow UI in the next lesson and further throughout the course. Before that, we're going to learn how to trigger an Airflow Dag with the UI. This is the Airflow Dags view with the Dags loaded into the system. Currently we have two Dags loaded and several options to interact with them.8. Running a workflow in Airflow UI
To run or trigger a Dag, we click the play button arrow.9. Running a workflow in Airflow UI
This brings up a Trigger popup; we keep the defaults here and click Trigger. This will initiate the Dag run.10. Running a workflow in Airflow UI
When the Dag run completes successfully, we see a green checkmark next to the date time of the run. We can click on this to reveal further details about the run.11. Running a workflow in Airflow UI
Finally, we can see the details of the tasks and can click on individual task details, in this case, generate_random_number.12. Let's practice!
We've looked at Airflow and some of the basic aspects of why you'd use it. We've also learned how to run a Dag from the Airflow UI. Let's practice now.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.