Get startedGet started for free

Automating with dbt build

1. Automating with dbt build

We're going to cover the aspects of another dbt subcommand, dbt build.

2. Review

We're almost to our last set of exercises so let's review the purpose of some major components in dbt. When working with data, we tend to start with sources and seeds. Sources represent the data coming initially from our extract and load processes. Seeds represent support data like country lists, area codes, or postal codes. The primary component used in dbt are models, which handle the transformation of data tasks, typically of sources or seeds, for downstream users. We can use tests to validate many of the components within dbt, including sources, seeds, and models. Remember that there are many types of tests available within dbt including built-in, singular, and generic / reusable tests. Finally, the dbt build subcommand can be used to perform all these tasks, usually in a production environment.

3. dbt build

The dbt build subcommand is a shortcut method to perform all the usual tasks within a dbt project. It combines multiple dbt tasks and will run any models, run validations via tests, and process any seeds as needed. All tasks are created as a single job and performed together so that if there are any failures along the line, they can be handled as a whole instead of potentially causing issues within a production dataset. dbt build also runs some other operations if defined, such as dbt snapshots. Note that dbt build does not perform any of the dbt docs operations, such as generating documentation or running the documentation server. Remember, if needed, the commands can also be run individually instead of just using dbt build.

4. dbt build - why?

You may wonder about the purpose of dbt build, given that it encompasses the behavior of separate subcommands. dbt build is for the situation where individual subcommands work well but may not handle all potential issues, especially when run in a production environment. dbt run works but doesn't validate the data first, meaning that no tests are run prior to the model updates. dbt seeds might not be complete for certain queries found in downstream models. dbt build will determine dependencies as a whole and run all tests prior to production changes. Note that dbt build may be overkill in testing environments or if small changes are made. In general, it's best to use dbt build if running against a production data warehouse, while the individual commands can be used in development or testing.

5. dbt build options

Let's look at a couple options available in dbt build that may be of use. The first is the --select option, which we've seen before with dbt test. This allows a list of specific dbt models to run a build of, and is useful in projects with many models. The next is the -d option, for debugging information which provides further details while running dbt build. The last we'll consider is the --exclude option, which takes a list of dbt objects to exclude from the build process. This may be useful when you have a large model that doesn't need to be built in certain situations.

6. Let's practice!

Let's work through many of the concepts we've seen in this course in the exercises ahead.