Get startedGet started for free

Debugging and troubleshooting in Airflow

1. Debugging and troubleshooting in Airflow

Welcome back! Let's take a look at one of the biggest aspects of running a production system with Airflow and data engineering in general - debugging and troubleshooting.

2. Typical issues...

There are several common issues you may run across while working with Airflow - it helps to have an idea of what these might be and how best handle them. The first common issue is a DAG or DAGs that won't run on schedule. The next is a DAG that simply won't load into the system. The last common scenario involves syntax errors. Let's look at these more closely.

3. DAG won't run on schedule

The most common reason why a DAG won't run on schedule is the scheduler is not running. Airflow contains several components that accomplish various aspects of the system. The Airflow scheduler handles DAG run and task scheduling. If it is not running, no new tasks can run. You'll often see this error within the web UI if the scheduler component is not running. You can easily fix this issue by running airflow scheduler from the command-line.

4. DAG won't run on schedule (part 2)

As we've covered before, another common issue with scheduling is the scenario where at least one schedule interval period has not passed since either the start date or the last DAG run. There isn't a specific fix for this, but you might want to modify the start date or schedule interval to meet your requirements. The last scheduling issue you'll often see is related to what we covered in the previous lesson - the executor does not have enough free slots to run tasks. There are basically three ways to alleviate this problem - by changing the executor type to something capable of more tasks (LocalExecutor or KubernetesExecutor), by adding systems or system resources (RAM, CPUs), or finally by changing the scheduling of your DAGs.

5. DAG won't load

You'll often see an issue where a new DAG will not appear in your DAG view of the web UI or in the airflow dags list output. The first thing to check is that the python file is in the expected DAGs folder or directory. You can determine the current DAGs folder setting by examining the airflow.cfg file. The line dags underscore folder will indicate where Airflow expects to find your Python DAG files. Note that the folder path must be an absolute path.

6. Syntax errors

Probably the most common reason a DAG workflow won't appear in your DAG list is one or more syntax errors in your python code. These are sometimes difficult to find, especially in an editor not setup for Python / Airflow (such as a base Vim install). I tend to prefer using Vim with some Python tools loaded, or VSCode but it's really up to your preference. There are two quick methods to check for these issues - airflow dags list-import-errors, and running your DAG script with python.

7. airflow dags list-import-errors

The first is to run airflow space dags space list dash import dash errors. Airflow will output some debugging information and the list of DAGs it's processed. If there are any errors, those will appear in the output, helping you to troubleshoot further.

8. Running the Python interpreter

Another method to verify Python syntax is to run the actual python3 interpreter against the file. You won't see any output normally as there's nothing for the interpreter to do, but it can check for any syntax errors in your code. If there are errors, you'll get an appropriate error message. If there are no errors, you'll be returned to the command prompt.

9. Let's practice!

Let's practice handling some of these common issues in the exercises ahead.