dbt seeds

1. dbt seeds

Let's now talk about dbt seeds and why we'd use them.

2. What are dbt seeds?

The first question is, what are dbt seeds? dbt seeds are CSV files that can be loaded into your data warehouse. These are typically rarely changing sets of data, such as lists of countries or lists of postal codes. Most importantly, seeds are not meant to contain raw data or data exported from another process.

3. Why?

You may be curious why to use dbt seeds. There are three primary reasons: First, CSV files are relatively easy to manage. You can edit them manually if needed, copy them, and so on. CSV files are easy to use, whether that's development, testing, or production options. Finally, CSV files, being text, are source controllable. These can be easily added into git repositories, compared against previous versions, and so forth.

4. How are seeds defined?

To define a seed in dbt, we add a CSV file to the seeds subdirectory in your dbt project. Make sure the header is the first row of the file. Once ready, use the dbt seed command to load the data into the data warehouse. Here's an example of a zipcode file found in our seeds directory. The header shows three columns of zipcode, place, and state, followed by a short list of locations. We then run the dbt seed command at the bash prompt to complete the import.

5. Further configuration

Further configuration options available for seeds in dbt. These include which schema and database to add the seed data. You can define options, such as whether to quote any, all, or specific columns. More importantly, you can define which datatypes to assign to given columns within the seed data. These changes can be applied to a whole project or to individual seeds as needed. If added to the whole project, you can use the dbt_project.yml file. Otherwise, these can be added to the seeds/properties.yml file. Note as with other components of dbt, you can name the properties.yml file whatever you like as long as it's present in the seeds directory.

6. Defining datatypes

One very common operation when working with seeds is defining a datatype on the columns. The available types will depend on your data warehouse. The typical options are available, including integer, varchar, and so forth. If the type is not defined, the data type is inferred from the seed data. Here's an example with the necessary YAML structure. Note that in this case, we're defining zipcode as a varchar of 5 characters instead of an integer. This is to preserve the leading zeros present in some US zip codes.

7. Test

Seeds also support tests just as models and sources do. In our zip code example we've added a unique validation test for the zipcode column.

8. Accessing seeds

Finally, let's discuss how to access seeds in your models after they're added to the warehouse. To access the seed by name, use the Jinja ref function. The seed behaves as a model once created, so you'd use the same function as you would for a model.

9. Let's practice!

We've covered a lot about seeds in dbt. Let's practice what we've learned in the exercises ahead.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.