Monitoring, Alerting, and Callbacks
1. Monitoring, Alerting, and Callbacks
Welcome back! Running Dags in production means more than just scheduling tasks; it means knowing when something goes wrong, and acting on it quickly. In this lesson, you'll learn how to configure Airflow's alerting and monitoring tools so your pipelines can notify you the moment they need attention.2. Dag Lifecycle
Every Dag run and task instance in Airflow moves through a defined sequence of lifecycle states. A task may begin in a queued state, transition to running, and ultimately reach a terminal state, successful, failed, or skipped, depending on the outcome. Dag runs follow a similar pattern at the pipeline level. These state transitions aren't just cosmetic; they're events that Airflow tracks, records, and can act on.3. Callbacks
Callbacks are functions on a Dag run or task that Airflow calls automatically at a given point. There are callbacks available for specific state transitions and they vary somewhat for Dag runs and tasks. The most common callbacks you'll use are the on_failure_callback, triggered when a Dag run or task fails, and the on_success_callback, triggered when they succeed. There are also task-specific callbacks, on_retry_callback, on_skipped_callback, and on_execute_callback, which are called before task retries, skips, and executions.4. Callback context
With each callback, Airflow passes a dictionary object named context. This contains information about the Dag run or task as appropriate. It includes information such as the dag_id and task_id. It can also contain the logical_date, which represents the date the Dag run started. All callback functions must accept the context dictionary or you'll receive an error. Here's a quick example illustrating the context being passed and accessed within the function.5. Callback example
Here we see a full example, with the alert_on_failure function we saw before. To actually call it, we set the on_failure_callback attribute on the Dag decorator and set it to the alert_on_failure function. In this case, if the data_import_task is called, it will automatically fail due to the ValueError exception and Airflow will automatically call alert_on_failure. We'd then get the message in the logs "Task data_import_task in Dag sales_etl_dag has failed." Note, the ValueError exception call is just for illustration and is not typically something you would do.6. Notifiers
Notifiers are Airflow functions that can be tied to a given callback. They are designed to send alerts to external systems. Several different notifier types are available. The SmtpNotifier is the most common and is used to send email alerts. There is also the SlackNotifier, which posts messages to a given Slack channel. There are many other notifiers available, including ones for the PagerDuty or OpsGenie services, which are third-party alerting services Airflow can tie into.7. SmtpNotifier
Let's look at the SMTP notifier, which will send an email on callback. It is imported from the airflow.providers.smtp.notifications.smtp library. It requires from_email and to addresses and can optionally include a subject and html_content, among others. In this case, it's added as the on_failure_callback using the SmtpNotifier constructor with the necessary attributes. This sends an email with subject "Dag sales_etl_dag has failed".8. Audit Log
A great way to monitor what's going on in Airflow is the Audit Log in the Airflow UI. It records a timestamped sequence of all events that occur on the Airflow instance, including callbacks and notifications.9. Let's practice!
We've covered a lot about Airflow alerting. Let's practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.