What is a dbt model?
1. What is a dbt model?
Let's now discuss the idea of a dbt model, what it is, and how it's used within dbt.2. What is a data model?
Before getting into how dbt uses models, let's discuss the general definition of a data model. There is no hard definition of a data model—its meaning depends on the context. At its core, a data model defines the logical organization and interpretation of a dataset whether a database table, Dataframe, or so forth. This could be a group of orders, customers, or the details of earthquakes in a given region. The data model also represents how a set of data and its components relate to each other. For example, various features of an animal, including number of legs, does it fly, etc, and how that information is maintained within the dataset. A primary purpose of a data model is to help users collaborate and understand the data in a common way.3. What is a data model?
We've created a list of animal species, the number of legs, and if the animal is venomous. While a simple model, this creates an specified list of features used when discussing an animal. Obviously we could use a different set of attributes to define our model of an animal. The selection of these attributes can define the effectiveness of the model. You should recognize that there are always trade-offs made when defining a model, including complexity, amount of space required, etc.4. What is a model in dbt?
A model in dbt represents something more specific than a basic data model - it represents the various transformations performed on the raw source datasets. These transformations are typically written in SQL, though newer versions of dbt can use Python for models / transformations. We won't be covering Python models in this course. Each model contains a SELECT query, transforming the source data as desired. These queries are saved in a text file, with a .sql extension. dbt automatically uses these files when tasked with operations, such as dbt run.5. Simple dbt model
Let's discuss the basics of creating a model in dbt. First, we create a directory under the models directory of your dbt project. This directory can be named anything, but may be referenced later, so it's best to keep it consistent. Next, a dot sql file is created as a text file. This can be done at the command line with a tool like touch, or via a text editor. We add the appropriate SQL statement to the text file. In this case, we select the first_name and last_name columns from the table source_table. Finally, execute `dbt run` to materialize the model.6. Reading from Parquet
Before continuing, let's discuss the Parquet format. Parquet is a columnar, binary format used to efficiently store data. It is widespread for sharing and distributing datasets. Columnar is in contrast to row-based formats like database tables. DuckDB can read Parquet files, without needing to import the data first. This is done using the read_parquet function in a SQL query. For example, SELECT * FROM read_parquet filename.parquet. You can also reference the Parquet file directly using single quotes.7. Let's practice!
There's more to learn about models in dbt, but first let's practice what we've covered in the exercises ahead.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.