Get startedGet started for free

Creating a production pipeline

1. Creating a production pipeline

We're almost to the end of this course and we've covered an extensive amount about Airflow. Let's look at a few reminders before building out some production pipelines.

2. Running DAGs & Tasks

You may remember way back in chapter 1, we discussed how to run a task. If not, here's a quick reminder - use airflow tasks test dag underscore id task underscore id and execution date from the command line. This will execute a specific DAG task as though it were running on the date specified. To run a full DAG, you can use the airflow dags trigger dash e then the execution date and dag_id. This executes the full DAG as though it were running on the specified date.

3. Operators reminder

We've been working with operators and sensors through most of this course, but let's take a quick look at some of the most common ones we've used. The BashOperator behaves like most operators, but expects a bash underscore command parameter which is a string of the command to be run. The PythonOperator requires a python underscore callable argument with the name of the Python function to execute. The BranchPythonOperator is similar to the PythonOperator, but the python callable must be a function that accepts a kwargs entry. As such, the provide underscore context attribute must be set to true. The FileSensor requires a filepath argument of a string, and might need mode or poke underscore interval attributes. You can refer to previous chapters for further detail if required.

4. Template reminders

A quick reminder is that many objects in Airflow can use templates. Only certain fields can accept templated strings while others do not. It can be tricky to remember which ones support templates on what fields. One way to check is to use the built-in python documentation via a live python interpreter. To use this method, open a python3 interpreter at the command line. Import any necessary libraries (ie, the BashOperator) At the prompt, run help with the name of the Airflow object as the lone argument. Look for a line referencing template underscore fields. This line will specify if and which fields can use templated strings.

5. Template documentation example

This is an example of checking for help in the python interpreter. Notice the output with the template fields entry - in this case, the bash underscore command and the env fields can accept templated values.

6. Let's practice!

A final note before working through our last exercises - as a data engineer, your job is not to necessarily understand every component of a workflow. You may not fully understand all of a machine learning process, or perhaps how an Apache Spark job works. Your task is to implement any of those tasks in a repeatable and reliable fashion. Let's practice implementing workflows for the last time in this course now.