Get startedGet started for free

Defining the DAG

In the previous exercises, you've completed the extract, transform and load phases separately. Now all of this is put together in one neat etl() function that you can discover in the console.

The etl() function extracts raw course and ratings data from relevant databases, cleans corrupt data and fills in missing value, computes average rating per course and creates recommendations based on the decision rules for producing recommendations, and finally loads the recommendations into a database.

As you might remember from the video, etl() accepts a single argument: db_engines. You can pass this to the task using op_kwargs in the PythonOperator. You can pass it a dictionary that will be filled in as kwargs in the callable.

This exercise is part of the course

Introduction to Data Engineering

View Course

Exercise instructions

  • Complete the DAG definition, so it runs daily. Make sure to use the cron notation.
  • Complete the PythonOperator() by passing the correct arguments. Other than etl, db_engines is also available in your workspace.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define the DAG so it runs on a daily basis
dag = DAG(dag_id="recommendations",
          schedule_interval="____")

# Make sure `etl()` is called in the operator. Pass the correct kwargs.
task_recommendations = PythonOperator(
    task_id="recommendations_task",
    python_callable=____,
    op_kwargs={"____": ____},
)
Edit and Run Code