Get startedGet started for free

dbt sources

1. dbt sources

Welcome back! Let's cover a different method to work with data in your data warehouse - dbt sources.

2. What is a dbt source?

In dbt, sources represent the ability to name and describe data loaded by the EL process. In other words, this is applying extra information to the data that is already in your data warehouse. Remember dbt only handles the transformation portion of the ELT process. We'll elaborate shortly, but sources are helpful in defining data lineage, data testing, and data documentation.

3. Sources

dbt sources are primarily present to provide data lineage information. As a reminder, data lineage describes the flow of data in a data warehouse. This helps with validation, troubleshooting, and various aspects of data trust, such as how important is this data and where did it come from. To access a given source, we use the Jinja source function. In the example, we use the source function instead of using a specific name or reference. Note this is used similarly to the ref function. Note "raw" is the database schema and "orders" is the name of the table. The source option also simplifies accessing the data. This is a single method to access your tables regardless of your data warehouse instead of needing to know specifics for each type of warehouse.

4. Defining a source

To define a source in dbt, we use a YAML file once again. This can be the models/model_properties.yml file used before, or can be any other .yml file in the models directory. Note that while it's named model_properties.yml, dbt only looks for a yml file for this information. The actual definition goes in the sources section of the yml file. Name the source starting with a - name option. This is usually the database name, such as raw. We then define each source table, with a - name option under the tables: section. An example shows two tables defined under the raw database, phone_orders and web_orders. Note different options are available depending on the data warehouse type. Refer to the dbt documentation for more information.

5. Accessing sources

To access a source in our models we use the source Jinja function. The source function takes two arguments, source_name and table_name. The function returns the proper name to access the source table depending on the data warehouse configuration. An example shows our dbt model source that we could call all_orders. It represents a union of the phone_orders and web_orders fields we saw previously. Once compiled, the source function is replaced by the database name dot table name. This will vary depending on your data warehouse.

6. Testing sources

You can also apply tests to sources, using the same methods that you apply to models. This includes built-in, singular, and generic / reusable tests. The tests are defined in the sources: section of the yml file, instead of the models: section. These are placed in the same yml file where the sources are defined.

7. Let's practice!

It's time to apply what we've learned in the coming exercises.