Get startedGet started for free

Data Vault

1. Data Vault

Hello! Let's dive into data vault modeling, a unique approach to data warehousing for long-term storage. It keeps data flexible and scalable. We'll dissect its core components and how they organize complex datasets. Ready to start?

2. Introduction to the data vault model

Data Vault modeling is a method for handling large data volumes. It excels in tracking historical data, being easy to audit and adapt. It involves three essential parts: hubs, links, and satellites. We'll explore each one.

3. Components of data vault

Hubs are unique concepts in our models, identified by a single business key. They are the fundamental entities. For example, in a university data model, hubs might be students, classes, and schools.

4. Components of data vault (1)

Links illustrate the connections between various hubs. In a university data model, a link could join the student hub to the classes hub, representing a student's enrollment in a class.

5. Components of data vault (2)

Satellites hold the descriptive, changing details linked to hubs and links. They capture dynamic data like a student's address or class schedule related to specific hubs or links. Satellites track attributes that change over time, keeping a history of these changes.

6. Building hubs

Let's begin building our university data vault model. First, we'll define our hubs; students, classes, and schools. We'll create each hub similarly, but let's walk through creating the students' hub as our first step.

7. Building hubs (1)

To keep things clear, we'll name our hubs as 'hub_' followed by the entity's name, making it easy to identify hub components. First, let's define the hub key, the unique identifier for each entity. Snowflake offers an AUTOINCREMENT property for attributes, automatically generating a unique value for each row and simplifying data entry for hubs. We'll set 'student_key' as the primary key.

8. Building hubs (2)

9. Building hubs (3)

Lastly, we'll include 'load_date' and 'record_source' attributes in our hubs for historical tracking. We are using the TIMESTAMP datatype for the 'load_date' column in Snowflake. This data type stores dates without a timezone, ensuring the date entered stays the same.

10. Building hubs (4)

In the way, we'll create 'hub_classes' and 'hub_schools'. Each gets an auto incremental hub key as the primary key. We'll then add business keys, 'class_id' for classes and 'school_id' for schools, plus 'load_date' and 'record_source' for historical tracking.

11. Building links

Next are links, which connect hubs. In our example, the link is the enrollment of students in classes. Let’s create this new entity.

12. Building links (1)

The link_key will identify this entity, as a primary key with auto incremental values.

13. Building links (2)

The next step is to add the foreign keys referencing the hubs to create the relationship between them.

14. Building links (3)

And last, adding load_date and record_source for history tracking.

15. Building links (4)

The same exercise would be for the link offerings, where we establish the relationship between schools offering classes.

16. Building satellites

Ok, it is time to create the satellites! Satellites will contain all the details from the hubs. Let’s use students as an example.

17. Building satellites (1)

We will establish an entity named 'sat_student' to capture the details of students, specifically their names and email addresses.

18. Building satellites (2)

Then, we add 'load_date' and 'record_source' that will be the attributes for historical data traceability.

19. Building satellites (3)

And finally, we have to add 'student_key' to be the foreign key referencing to 'hub_students'.

20. Building satellites (4)

In the same way, we'll create Satellites for classes and schools, detailing each entity. These Satellites will also have attributes for tracking history and a foreign key referencing their respective hub.

21. Terminology and functions overview

Let's summarize new functions we have learned in this cheatsheet.

22. Functions overview

Take note of the implementations of the functions as well.

23. Let's practice!

So far, we have studied three great models: Entity-Relationship, Dimensional, and now Data Vault. They seem similar sometimes, but do not worry; we will compare the approaches to distinguish them clearly. We will now practice using a Data Vault, which provides a strong structure for managing complex historical data. Let's put these concepts into action!