Scheduling Dags
1. Scheduling Dags
Welcome back! Let's schedule our workflows and have them run automatically.2. Dag Runs
Scheduling in Airflow involves the Dag run. This is an instance of a workflow at a given point in time. For example, it could be the currently running instance, or it could be one run last Tuesday at 3pm. A Dag can be run manually, or using the schedule parameter passed when the Dag is defined. Each Dag run maintains state for itself and underlying tasks. The Dags can have a running, failed, or success state. Each task can have these states or queued or skipped.3. Dag Runs view
In the Airflow UI, you can view all Dag runs under the Dags: Dag Runs menu. This provides details about any Dags that have run on the current Airflow instance.4. Dag Runs state
You can view the state of a Dag run and whether the Dag run was successful.5. Schedule details
Scheduling a Dag involves many attributes. The start_date specifies the earliest time the Dag could be scheduled. This is a Pendulum datetime object. Recall, Airflow uses Pendulum for timezone handling. The end_date represents the last possible time to schedule the Dag. Both of these use a datetime object, taking the year, month, and day as arguments, along with a timezone specified by the tz argument. UTC is recommended.6. Schedule
The schedule represents how often to schedule the Dag runs. The scheduling occurs between the start_date and the potential end_date. Note this is not when the Dags will absolutely run, but a range when they could be scheduled. The schedule interval can be defined by a couple methods, covered next.7. cron syntax
The cron syntax is the same format as the Unix cron tool. It includes five fields separated by a space, starting with the minute value (0-59), the hour (0-23), the day of the month (1-31), the month (1-12), and the day of week (0-6). An asterisk represents running every interval - for example, an asterisk in the minute field means run every minute. A list of values can be given on a field using comma-separated values.8. cron examples
The cron entry 0-12-asterisk-asterisk-asterisk means run daily at 12:00-Noon. asterisk-asterisk-25-2-asterisk is once per minute, but only on February 25th. 0-comma-15-comma-30-comma-45-asterisk-asterisk-asterisk-asterisk means run every 15 minutes.9. Airflow scheduler presets
Airflow has several presets, or shortcut syntax options representing often-used time intervals. The @hourly preset means run once an hour at the beginning of the hour. It's equivalent to 0-asterisk-asterisk-asterisk-asterisk in cron. The other presets behave similarly.10. Special presets
Airflow has three special presets. None means never schedule the Dag, and is used for manually triggered workflows. @once means only schedule a Dag once. @continuous is used to start a Dag immediately following the completion of a previous run.11. timedelta
Another scheduling method uses Pendulum's duration object. Like timedeltas, durations can use weeks, days, hours, or minutes.12. Applying schedules
To define the schedule, we need to add a schedule parameter to the @dag decorator. This parameter can use any of the schedule types we've seen so far: cron, presets, or timedeltas.13. schedule issues
Note that Airflow won't schedule the first Dag run until one full interval has passed beyond the start date; a @daily Dag with a start_date of Feb 25 first runs on Feb 26. This gets more noticeable with longer intervals, like @weekly or @monthly.14. Let's practice!
Let's practice scheduling workflows now.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.