Creating a production pipeline
1. Creating a production pipeline
We're almost to the end of this course and we've covered an extensive amount about Airflow. The pieces we've discussed will enable you to create a production-level pipeline. Let's review a few key reminders before building out some production pipelines.2. Running Dags and tasks
You may recall from the first chapter that we discussed how to run a task in Airflow. Here's a quick reminder. You can run a specific task using airflow tasks test dag underscore id task underscore id and execution date from the command line. This will execute a specific Dag task as though it were running on the specified date. To run a full Dag, you can use the airflow dags trigger dash dash logical-date then the execution date and dag_id. This executes the full Dag as though it were running on the specified date.3. Tasks reminder
We've been working with operators and sensors through most of this course, but let's take a quick look at some of the most common ones we've used. The @task decorator designates a given Python function as an Airflow task. @task.bash is similar, but runs a command based on the output from the shell. @task.branch uses logic aware systems to filter between two or more tasks based on a given attribute or set of attributes. The FileSensor requires a filepath argument of a string, and might need mode or poke underscore interval attributes.4. Template reminders
Here's a quick reminder that many objects in Airflow can use templates. However, only certain fields can accept templated strings, making it tricky to remember which fields support templates. One way to check is by using the built-in Python documentation through a live Python interpreter. To use this method, open a python3 interpreter at the command line. Next, import any necessary libraries, including the task object. At the prompt, run help followed by the name of the Airflow object as the lone argument. Look for a line that references template underscore fields. This line will specify which fields can use templated strings.5. Template documentation example
This is an example of checking for help in the python interpreter. Notice the output with the template fields entry. In this case, the bash underscore command and the env fields can accept templated values.6. Working with Airflow
A final note before working through our last exercises. As a data engineer, your job is not to necessarily understand every component of a workflow. You may not fully understand all of a machine learning process, or perhaps how an Apache Spark job works. Your task is to implement any of those tasks in a repeatable and reliable fashion.7. Let's practice!
Let's practice implementing workflows for the last time in this course now.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.