Triggers and Fault Tolerance

1. Triggers and Fault Tolerance

Welcome to the last chapter! We've discussed many aspects of Airflow. Now, let's focus on production environments, starting with triggers and fault tolerance.

2. Building robust pipelines

In Airflow, triggers allow you to control when tasks can run. They enable you to define tasks that execute under specific conditions, such as when other tasks fail. The default trigger is set to all success, but triggers offer more flexibility in cases where a task fails or is skipped. Fault tolerance enables recovery within a task itself. If a task fails, fault tolerance allows you to automatically retry that individual task a configurable number of times, rather than having to rerun the entire Dag.

3. Trigger rules

Trigger rules check the state of previous tasks before the next task starts and let you define if tasks should proceed. By default, all previous tasks must finish successfully before continuing. Trigger rules let you change that behavior.

4. Trigger rules uses

Trigger rules are typically used for things like notification tasks, cleanup tasks, and conditional execution.

5. Key trigger rules

Let's run through the main trigger rules. all_success is the default and means that everything upstream finished successfully without skipping. all_failed is the opposite, meaning only run if all previous tasks failed. all_done checks that each previous task completed, regardless of success or failure. one_failed means at least one previous task failed. one_success means at least one upstream task succeeded. none_failed is similar to all_success, but also allows for skipped tasks. We'll discuss skips in the next lesson.

6. Trigger rule implementation

To use trigger rules, we must first import the TriggerRule enum from airflow.utils.trigger_rule. These are applied to the @task decorator with the trigger_rule attribute. For example, here's a task with trigger_rule set to TriggerRule.all_success.

7. Fault tolerance attributes

Trigger rules help determine if a task should run based on prior tasks. However, a task may still fail, which is where fault tolerance comes in. A few attributes enhance the fault tolerance of your tasks. The first is retries, which specifies the number of times Airflow should retry a failed task before marking it as failed. This is useful for addressing minor issues, such as data not being ready or files not being in place. The second is retry_delay. This takes a timedelta object that defines how long to wait before retrying a task. The appropriate delay can vary depending on the task type and is up to you to determine. The example shown demonstrates setting up to 3 retries with a 5-minute delay between attempts.

8. TriggerDagRunOperator

Airflow has an interesting operator, TriggerDagRunOperator, that allows a task in one Dag to trigger another Dag. This functionality provides options for reusing code and Dags. The TriggerDagRunOperator is imported from airflow.providers.standard.operators.trigger_dagrun and includes key attributes like trigger_dag_id, which must match the dag_id, and wait_for_completion, which determines if further tasks should wait until the child dag is complete. Similar to sensors, there's also an option for poke_interval, which defines how often to check if the child Dag has completed. Finally, the conf attribute allows you to pass data into the child dag.

9. TriggerDagRun Example

Here's an example of the TriggerDagRunOperator that calls the child_pipeline_dag when task1 completes. This operator pauses the parent Dag until the child completes, and checks every 30 seconds until it does. This passes a configuration detail for an s3 bucket to the child Dag. Finally, note that the cleanup task runs after the trigger_child task.

10. Let's practice!

Now, let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.