Creating and generating dbt documentation
1. Creating and generating dbt documentation
Welcome back! Let's take a look at an often overlooked issue in data engineering and warehousing: documentation.2. Why document?
It may sound like a silly question, but we should consider why we want to document a data project. It's common when working on a data project to create notes within the code and queries for ourself or any other engineers working at that level. Data projects, however have more consumers than just these roles and it's not feasible to provide everyone with access to source code just for documentation purposes. Centralizing the source of documentation is another benefit to documenting data projects. While it is possible to provide this information via something like email, it's far easier to have the documentation available at a known location. We're also interested in providing details about updates and changes to our data feeds, as well as creating a repository for examples, suggestions for the use of the data, and details about SLAs (Service Level Agreements - how often the data is updated and any guarantees of accessibility of the data.)3. Creating documentation in dbt
dbt provides options to automatically add documentation to your project in various ways. This includes adding information to model definitions, including the overall model description as well as individual column descriptions if desired. The dbt documentation also can show the data lineage or DAG (directed acyclic graph), meaning the flow of data from initial source tables and any transformation or aggregate tables we create. We can also get any information about tests and data validations that are applied to our models from the dbt documentation tools. Finally, we can also see the details about the generated data warehouse - including the column data types and data sizes that are created when the data is processed.4. Generating documentation in dbt
To actually generate the documentation, dbt provides a subcommand, dbt docs. The dbt docs subcommand has a few subcommands of its own. This includes the help option, dbt docs -h, which gives a description of the commands available for dbt docs. Let's talk about the primary one, dbt docs generate. This will traverse the content of our project, automatically creating the documentation website and formatting it into a static website. Given this documentation will update as we add models, tests, and so on, we should run this command after generating the project with dbt run.5. Accessing documentation
To access the generated documentation, we'll need a web browser and the documentation to be hosted somewhere. There are several options for hosting the documentation depending on our needs. These can include using the other subcommand for dbt docs, the dbt docs serve subcommand. This starts a webserver on the local system and provides access to the documentation. Note that while convenient, this should only be used locally during development as it is not designed with security in mind. The other option for hosting the documentation is using another hosting service. This can include dbt cloud, Amazon's S3, any modern web server including Nginx, Apache, and so forth.6. Documentation example
This is an example view of the documentation page, which can provide details of the models, description information, column details, and the lineage graphs.7. Let's practice!
We've covered quite a bit about using documentation in dbt. Let's solidify what we've learned in the exercises ahead.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.