Building dbt data marts and snapshot models

1. Building dbt data marts and snapshot models

Welcome back! It's time to build data marts and snapshot models!

2. Introducing dbt data mart models

dbt data marts are a set of clean and accessible models at the end of the ELT pipeline. Here are some business reasons for building data marts. From a code perspective, data marts are also useful for reducing repetition by storing SQL as code.

3. Looker e-commerce data marts

For Looker e-commerce, we will build two data marts. A customers mart answers questions like "Who are our customers" and "What are they purchasing"? A products mart answers questions like "What is our revenue and profit?" Customers mart will be one row per customer. Products mart, one row per product.

4. Looker e-commerce data marts

For each of the mart tables, we will be joining a few staging models together. This diagram visualizes what we are building toward. As we can see, it takes several layers of transformation to build toward one data mart.

5. Building step-by-step

Because data mart join logic can be complex, the SQL should always be refined first outside of dbt. Once we can verify that the SQL does run and works as expected, then the SQL gets pasted into a dbt model file, and the tables are replaced with dbt references. After that, we build data tests and documentation inside dbt yaml files. Finally, we test our build!

6. Introducing dbt snapshot models

While data marts are great for answering business questions, they are usually designed in a way that is hard to track changes over time. For example, consider the existing orders staging model. It's built off of raw orders data which is one row per order and tracks the latest known status per order. The five statuses are: Processing, Shipped, Complete, Cancelled, and Returned. Here, both orders 88616 and 88641 are returned orders. If we wanted to know how long it took for either orders to get from "Processing" to "Returned", we can't! That data has already been overwritten! The only way to do so is to create a separate dbt snapshot that records the change over time.

7. Orders: snapshot status change

To create a snapshot model, we create a new file and at the top of the file, write jinja syntax curly-bracket-percentage-sign-snapshot, followed by the model name, and then ending with percentage-sign-end-curly-bracket to indicate that this is a snapshot model. The configuration settings are for further specifications on how to snapshot. Instead of `dbt run`, notice we use `dbt snapshot` to capture incremental changes in the data. Although we are running the snapshot manually as a one-off, in real life, we would schedule the run either on dbt cloud or some other orchestration tool, so that we have control over timing.

8. Let's practice!

Let's dive right in!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.