1. Slowly changing dimensions
Hello again. We will continue our discussion of data modeling by reviewing Kimball's approach to modeling slowly changing dimensions. It should be noted that there is a modern approach to this challenge, but we will review first the historic standard approach. Okay, let's start our discussion.
2. The challenge
Over time, some values in our dimensional tables will need to be updated.
For example, imagine we are working on a data warehouse for a used car dealership. We have a record in our product dimension table for Tesla's model Y. It was initially classified as an electric vehicle, but now we want to update values to something more descriptive, electric-crossover. There are three solutions for this problem: type I, II, and III. This only applies to dimension tables. The value of the metrics captured in a fact table should stay the same over time. For example, suppose we are capturing revenue in a fact table. In that case, the total revenue for a company last year should stay the same over time, excluding capturing the correct data initially.
3. Type I
The Type I methodology updates the row's value to the current value.
This will update the value for any new reporting, but it also erases any history of the previous "electric vehicle" value.
If we rerun any historical reports, this car will now show under the electric-crossover category. Therefore, these historical reports will not match the original values. In some situations, this will not be acceptable. The reproducibility of reports is often an important requirement for organizations.
4. Type II
In Type II, we add a new row for Tesla to the dimension table with a new id and the updated category value.
This will allow us to maintain any historical reports because those historical records in the fact table will reference product id 12345 in the dimensional table, and any new records in the fact table will reference 20053. Finally, we can add columns to track the start and end of when the values are in effect.
Now that we have looked at Types I and II, let's look at type III.
5. Type III
With type III, we add another column to the dimension table to capture the previous category. So, we set the value of the past category to electric vehicle and update the category to the new value of electric crossover. We will reference the past category column in the dimension table if we need to rerun a historical report.
This approach supports looking at both a current and historical view simultaneously.
However, this may require updating the coding for historical reports, and we are limited to tracking two historical changes in this setup. In theory, we can add more columns to track more changes. Additionally, we can add a column to track the date on when we change the category.
We have now reviewed the three standard Kimball approaches to slowly changing dimensions. However, there is a modern approach.
6. Modern approach
A modern approach takes advantage of the low storage costs by saving snapshots of the entire dimension table. We would update the value of the dimension table similarly to Type I, and the warehouse system stores snapshots of the current state of the dimension table. Therefore, when rerunning historical reports, we use the dimension table snapshot that was current at the time. Typically, dimension tables have significantly fewer rows than fact tables, and storing duplicate copies of the tables is a manageable task for modern hardware.
7. Let's practice!
Okay, let's practice what we learned in this video!