Get startedGet started for free

Introduction to Star Schemas

1. Introduction to Star Schemas

In this video, we’ll talk about the star schema, a key data structure that facilitates data analysis.

2. Dimensional Modeling basics

A Star Schema is a type of dimensional model, a data structure that is optimized for data warehousing and analysis. Data warehouses contain data about quantifiable business events.

3. Dimensional Modeling basics

For example, the sale of a product at a retail shop is a business event, as shown in this receipt.

4. Dimensional Modeling basics

The event information includes numeric data, such as the number of units sold and the sale price.

5. Dimensional Modeling basics

It also includes some context, such as when the sale was made, which product was sold, and which shop sold it.

6. Storing event data in tables

The sale event described in the receipt can be stored in a database table, with one row per sale line item as shown here.

7. Storing event data in tables

The table that stores all event transactions is called a fact table.

8. Storing event data in tables

Some of the table columns provide context, such as when and where the sale was made, what product was sold and who sold it. These are the dimensions.

9. Storing event data in tables

Other columns store the numeric data such as how many units were sold. These are the measures.

10. Storing event data in tables

Other columns provide additional information about the transaction, for example the receipt number. These are called attributes.

11. Normalization

We could choose to store the data in this way. However, note that the data has many repeated values. All sale records from a given store have the same store name. If all transactions were saved like this, we would use a lot of space to store the same information multiple times. Instead, we can create a separate table with the store names, assign a numeric key to each store. Then, we can replace the store name with the store key in the fact table. The resulting tables will take less space on disk because numbers take less space than text columns. This process of storing data in a way that reduces duplication is called normalization.

12. Fact Tables + Dimension Tables

If we follow the same normalization approach for all the dimension columns, we will end up with a model like this.

13. Star Schema

The resulting schema resembles a star with the fact table at the center and the dimensions surrounding it.

14. Snowflake Schema

Sometimes dimensions have attributes that can become dimensions by themselves. For example, a City attribute on a Customer dimension table.

15. Snowflake Schema

We can further normalize the Customer dimension table by creating a new City dimension.

16. Snowflake Schema

When we normalize all dimension tables, the result is called a Snowflake Schema.

17. Star Schema vs Snowflake Schema

Models based on Star Schema take more storage than Snowflake Schema because of the use of denormalized tables. However, Star Schema models have the advantage of being simpler to build and understand than the more complex Snowflake Schema. Star Schemas also results in faster query performance because it doesn't require as many JOINs as the Snowflake Schema. You can choose the model most suitable for your needs. Generally, Star Schema is preferred because it results in simpler, faster models.

18. Let's practice!

Now, let's do a couple of exercises to get more familiar with these data modeling scenarios.