Hierarchical models in dbt
1. Hierarchical models in dbt
Welcome back to our last topic of this course! We've covered the basics of working with and updating models. Let's now take a look at hierarchical models and the components that go with them.2. What is a hierarchy in dbt?
First, what is a hierarchy in dbt? A hierarchy represents the dependencies within a dbt project, meaning the relationship between source and transformed data. This is also known as a DAG, or a directed acyclic graph. It's sometimes known as a lineage graph. Note that while a DAG is a common concept in data engineering tools, such as Spark or Airflow, we're referring to a DAG specifically as implemented in dbt. The primary purpose of a DAG or hierarchy is it allows models to be built and updated with their dependencies in mind. dbt must determine the order that models be built and run accordingly.3. Hierarchy details
Here, avg_fare_per_day and total_creditcard_riders_per_day are two tables that, in turn, depend on the taxi_rides_raw table. Knowing this hierarchy, dbt will build the taxi_rides_raw table first to make certain the data is available to build the other downstream tables. Without the lineage graph - or this hierarchy -, the tables would be built in alphabetical order, which would fail when attempting to build the avg_fare_per_day model as taxi_rides_raw would not yet be built.4. How are hierarchies defined?
The next question is how are hierarchies defined in dbt? We can use the Jinja template language to define the model dependencies. This is done within the model definition file, meaning the .sql file. Most often, we define the hierarchies using the ref function within a Jinja template. To actually define a dependency, we simply replace the table name in our query with two opening curly braces, then ref open parenthesis single quote model name end quote close parenthesis, followed by two closing braces in our SQL query. The next step is to use dbt run, which will materialize the models. dbt will replace the ref templates with the actual table names in the generated SQL file.5. Hierarchy example
A quick example illustrates the change - in the first query, we're directly using the name of the table. While this works, we may run into issues if the table is not created yet. To add the dependency, you'll notice we use Jinja to change the table name from taxi_rides_raw to {{ ref('taxi_rides_raw') }}.6. Let's practice!
We've covered a lot in this video - let's practice our new skills in the exercises ahead!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.