Get startedGet started for free

Creating dbt descriptions and tests

1. Creating dbt descriptions and tests

Welcome back! Previously, we've completed our project set-up, loaded in data, and built some dbt models. Now, let's layer user-defined descriptions and data tests on top of these models.

2. docs: dbt user-defined descriptions

Documentation is critical in understanding, maintaining, and collaborating on code. The same applies to dbt. dbt documentation is done using yaml files in what dbt calls user-defined descriptions. These yaml files can document models, sources, seeds, data tests, you name it. The yaml files should be stored in the same directory as the assets. For best practice, name the file similar to the asset it is documenting. For example, underscore-looker-underscore-models-yaml is for documenting dbt models.

3. docs: dbt model yaml

Let's take a look at a sample user-defined description yaml file. Note the version number at the top of the file. This is what version of the schema configuration format dbt is using. Currently the latest is 2 as set by dbt. There is no need to change that. Secondly, the keyword models on line 3 tells us that this yaml file is used to document dbt models. If we are creating a yaml file for dbt sources, then this will be replaced with the word sources. Lastly, we can document on both the model and its columns. However, notice the spacing. Model names are prefaced by two spaces, followed by a dash and a name. Columns are prefaced by four spaces.

4. dbt data tests: not null and unique

dbt model yaml files are used for more than documenting what each column means. It is also the place for creating data tests. There are four out-of-the-box dbt data tests. First, a unique test will fail if there are duplicate values in the column specified. Second, a not null test will fail if there are null values in the column, but will ignore any empty strings. Both can be applied to any columns individually, but they are more often used together for primary key columns, since primary keys need to be both unique and not null.

5. dbt data tests: accepted values

Third, accepted values test makes sure that only the values in the list can appear in the column. If any other values appear, this test will fail. This is to guard against data drift.

6. dbt data tests: relationships

Lastly, relationships test is the dbt version of creating foreign keys to other tables. This helps us keep track that data is propagating correctly from upstream to downstream tables. In this example, this data test ensures that every value in column-one in table-one exists as a value in column-two in table-two. Otherwise the test will fail noisily.

7. Let's practice!

Let's practice!