Airflow core concepts
1. Airflow core concepts
Welcome! Let's take your Airflow skills to the next level.2. Meet your instructor
I'm Volker Janz. I spent over 14 years as a data engineer in the gaming industry, building and scaling data platforms from the ground up. I've been working with Airflow since version 1.x, so I've seen the tool grow and mature over the years. Today I'm a Developer Advocate at Astronomer, where I help data teams get the most out of Airflow. I'm excited to share what I've learned with you.3. What you'll build
By the end of this course, you'll know how to author Dags with the TaskFlow API, build dynamic workflows with task mapping and asset-based scheduling, handle failures with retries and callbacks, and run SQL workloads through Airflow.4. Before we start
This course picks up where the introductory Airflow course left off. You should already be comfortable with Dags, tasks, operators, and the basics of scheduling. If any of that sounds unfamiliar, take the intro course first.5. Quick refresher
Let's start with a quick refresher. A Dag is a collection of tasks with dependencies between them. Tasks are individual units of work, and operators, or decorators like @task, define what each task does. Dependencies set the order tasks run in. In this example, we import dag and task from the Airflow SDK. The @dag decorator marks the outer function as a Dag, and @task marks each step inside it. The first task fetches data from an API, and the second uses @task.bash to print the result. Passing the return value from one task to the next wires the dependency automatically, and that is the pattern you'll use in your first exercise.6. Airflow architecture
Airflow has six core components, organized into three groups. The Scheduler and Dag Processor handle orchestration: deciding when tasks run and parsing your Dag files. Workers and the Triggerer handle execution: actually running your task code. And the API Server and Metadata Database provide the interface and storage layer: the UI you interact with and the state Airflow keeps track of.7. Scheduling approaches
Airflow gives you three ways to schedule a Dag. Setting schedule to None means the Dag only runs when you trigger it manually. That is useful for testing or event-driven patterns. A cron expression like this one runs the Dag every day at 6 AM. And you can schedule based on data availability using Assets, which we'll explore in Chapter 2. By default, catchup is off, so deploying a Dag with a past start date won't backfill missed runs.8. Two ways to write Dags
There are two ways to write Dags. On the left, the classic approach: you create operator instances and wire dependencies manually. On the right, the TaskFlow API: you use Python decorators, and dependencies are created automatically when you pass return values between functions. Both work, and classic operators are still the right choice when a provider doesn't offer decorators. But TaskFlow is more Pythonic and cuts a lot of boilerplate. You can also combine both approaches seamlessly, and we'll go deep on TaskFlow in the next lesson.9. Exercises in this course
Before we jump into the exercise, here is one note. This course has lots of IDE exercises where you edit real Python files, just like you would do in production. To run your code, you can either click Run file button or type python3 followed by the filename in the terminal. Each file calls dag.test() at the bottom, which runs the full Dag in a single process, so you see actual task output. Just make sure the run finishes before you submit your solution. We'll cover .test() in depth in Chapter 3, but for now, it's how you'll run and verify your Dags.10. Let's practice!
Time for practice.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.