Get startedGet started for free

Airflow scheduling

1. Airflow scheduling

Welcome back! We've spent most of this chapter implementing Airflow tasks within our workflows. Now let's take a look at what's required to schedule those workflows and have them run automatically.

2. DAG Runs

When referring to scheduling in Airflow, we must first discuss a DAG run. This is an instance of a workflow at a given point in time. For example, it could be the currently running instance, or it could be one run last Tuesday at 3pm. A DAG can be run manually, or via the schedule interval parameter passed when the DAG is defined. Each DAG run maintains a state for itself and the tasks within. The DAGs can have a running, failed, or success state. The individual tasks can have these states or others as well (ie, queued, skipped).

3. DAG Runs view

Within the Airflow UI, you can view all DAG runs under the Browse: DAG Runs menu option. This provides the assorted details about any DAGs that have run within the current Airflow instance.

4. DAG Runs state

As mentioned, you can view the state of a DAG run within this page, illustrating whether the DAG run was successful or not.

5. Schedule details

When scheduling a DAG, there are many attributes to consider depending on your scheduling needs. The start date value specifies the first time the DAG could be scheduled. This is typically defined with a Python datetime object. The end date represents the last possible time to schedule the DAG. Max tries represents how many times to retry before fully failing the DAG run. The schedule interval represents how often to schedule the DAG for execution. There are many nuances to this which we'll cover in a moment.

6. Schedule interval

The schedule interval represents how often to schedule the DAG runs. The scheduling occurs between the start date and the potential end date. Note that this is not when the DAGs will absolutely run, but rather a minimum and maximum value of when they could be scheduled. The schedule interval can be defined by a couple methods - with a cron style syntax or via built-in presets.

7. cron syntax

The cron syntax is the same as the format for scheduling jobs using the Unix cron tool. It consists of five fields separated by a space, starting with the minute value (0 through 59), the hour (0 through 23), the day of the month (1 through 31), the month (1 through 12), and the day of week (0 through 6). An asterisk in any of the fields represents running for every interval (for example, an asterisk in the minute field means run every minute) A list of values can be given on a field via comma separated values.

8. cron examples

The cron entry 0 12 asterisk asterisk asterisk means run daily at Noon (12:00) asterisk asterisk 25 2 asterisk represents running once per minute, but only on February 25th. 0 comma 15 comma 30 comma 45 asterisk asterisk asterisk asterisk means to run every 15 minutes.

9. Airflow scheduler presets

Airflow has several presets, or shortcut syntax options representing often used time intervals. The @hourly preset means run once an hour at the beginning of the hour. It's equivalent to 0 asterisk asterisk asterisk asterisk in cron. The @daily, @weekly, @monthly, and @yearly presets behave similarly.

10. Special presets

Airflow also has two special presets for schedule intervals. None means don't ever schedule the DAG and is used for manually triggered workflows. @once means to only schedule a DAG once.

11. schedule_interval issues

Scheduling DAGs has an important nuance to consider. When scheduling DAG runs, Airflow will use the start date as the earliest possible value, but not actually schedule anything until at least one schedule interval has passed beyond the start date. Given a start_date of February 25, 2020 and a @daily schedule interval, Airflow would then use the date of February 26, 2020 for the first run of the DAG. This can be tricky to consider when adding new DAG schedules, especially if they have longer schedule intervals.

12. Let's practice!

We've covered a lot of ground in this lesson and chapter. Let's practice scheduling our workflows and we'll see each other back in chapter 3.