1. Data warehouse data modeling
Hello again. Now we will discuss data modeling. Data modeling refers to how we organize data in a database into tables and how to relate those tables if we want to join them.
2. Data models
When it comes to data modeling, the star and snowflake schemas are two common approaches for a warehouse using the bottom-up Kimball approach. Schema means to plan or model. So, star and snowflake schemas are models for organizing our data.
These data models denormalize organizational data into dimensions and fact tables. Next, we will look at an example to better explain dimension and fact tables in a data model.
3. It's Bravo again!
For example, recall our hypothetical company that sells fancy home office furniture named Bravo.
4. Fact table
A fact table consists of measurements or metrics about an organizational process. These are known as facts. Each row within the table captures a measurement or metric about one process transaction.
For example, Bravo tracks sales order information in a fact table that includes measures like the quantity of a product sold and the taxes collected for each of Bravo's sales orders.
This fact table also includes information about the customer who purchased the product, the date, and the product sold. These values are not quantities on which we can perform mathematical operations but are foreign keys that allow us to join to other tables named dimension tables that provide valuable information and detail about the sales order. For example, we could find more information about each customer who purchased an item by joining the sales order fact table to the customer dimension table.
Tracking if a customer is a strategic customer is usually not an element of the fact table but of the dimension table.
5. Dimension table
Dimension tables contain attributes and characteristics that describe the data in the fact table. These characteristics are called dimensions.
For example, Bravo's customer dimension table includes customer attributes like name, account number, email address, and mailing country address. It is a reference table. By joining this dimension table with the sales fact table, we can analyze Bravo's total sales by country. This is valuable information because it adds more detail to the total sales metric.
We joined the fact table with just one dimension table in this example. By joining the fact table with other dimensional tables, such as the product and date dimensional tables, we can analyze the total sales with a larger set of questions. This is what a star schema is.
6. Star schema
A star schema is an organizational structure that uses a single fact table and one or more dimensional tables. The central table, surrounded by dimensional tables, somewhat resembles a star.
The relatively few joins needed for a star schema make queries fast and easy to use by organizational users.
7. Snowflake schema
A snowflake schema is similar to a star schema. However, at least one of the dimensional tables does not directly join the fact table and must be joined through another dimensional table. For example, if we extend our Bravo star schema by adding another dimensional table for customer countries, the data model becomes a snowflake. This is because we cannot join the country table directly to the fact table but only through the customer table. Through this additional join, we could analyze total sales by customer language. We now have a richer dataset to answer more questions.
With that, we now know what fact and dimensional tables are and how they fit into the data models of star and snowflake schemas.
8. Let's practice!
Let's practice what we've learned.