Get startedGet started for free

Scheduling Dags

1. Scheduling Dags

Welcome back! Let's schedule our workflows and have them run automatically.

2. Dag Runs

Scheduling in Airflow involves the Dag run. This is an instance of a workflow at a given point in time. For example, it could be the currently running instance, or it could be one run last Tuesday at 3pm. A Dag can be run manually, or using the schedule parameter passed when the Dag is defined. Each Dag run maintains state for itself and underlying tasks. The Dags can have a running, failed, or success state. Each task can have these states or queued or skipped.

3. Dag Runs view

In the Airflow UI, you can view all Dag runs under the Dags: Dag Runs menu. This provides details about any Dags that have run on the current Airflow instance.

4. Dag Runs state

You can view the state of a Dag run and whether the Dag run was successful.

5. Schedule details

Scheduling a Dag involves many attributes. The start_date specifies the earliest time the Dag could be scheduled. This is a Pendulum datetime object. Recall, Airflow uses Pendulum for timezone handling. The end_date represents the last possible time to schedule the Dag. Both of these use a datetime object, taking the year, month, and day as arguments, along with a timezone specified by the tz argument. UTC is recommended.

6. Schedule

The schedule represents how often to schedule the Dag runs. The scheduling occurs between the start_date and the potential end_date. Note this is not when the Dags will absolutely run, but a range when they could be scheduled. The schedule interval can be defined by a couple methods, covered next.

7. cron syntax

The cron syntax is the same format as the Unix cron tool. It includes five fields separated by a space, starting with the minute value (0-59), the hour (0-23), the day of the month (1-31), the month (1-12), and the day of week (0-6). An asterisk represents running every interval - for example, an asterisk in the minute field means run every minute. A list of values can be given on a field using comma-separated values.

8. cron examples

The cron entry 0-12-asterisk-asterisk-asterisk means run daily at 12:00-Noon. asterisk-asterisk-25-2-asterisk is once per minute, but only on February 25th. 0-comma-15-comma-30-comma-45-asterisk-asterisk-asterisk-asterisk means run every 15 minutes.

9. Airflow scheduler presets

Airflow has several presets, or shortcut syntax options representing often-used time intervals. The @hourly preset means run once an hour at the beginning of the hour. It's equivalent to 0-asterisk-asterisk-asterisk-asterisk in cron. The other presets behave similarly.

10. Special presets

Airflow has three special presets. None means never schedule the Dag, and is used for manually triggered workflows. @once means only schedule a Dag once. @continuous is used to start a Dag immediately following the completion of a previous run.

11. timedelta

Another scheduling method uses Pendulum's duration object. Like timedeltas, durations can use weeks, days, hours, or minutes.

12. Applying schedules

To define the schedule, we need to add a schedule parameter to the @dag decorator. This parameter can use any of the schedule types we've seen so far: cron, presets, or timedeltas.

13. schedule issues

Note that Airflow won't schedule the first Dag run until one full interval has passed beyond the start date; a @daily Dag with a start_date of Feb 25 first runs on Feb 26. This gets more noticeable with longer intervals, like @weekly or @monthly.

14. Let's practice!

Let's practice scheduling workflows now.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.