Get startedGet started for free

Data warehouse architectures

1. Data warehouse architectures

Welcome back. In this video we consider two data warehouse architectures.

2. Inmon - top-down

The first architecture we will discuss was popularized by Bill Inmon, also known as the top-down approach. Bill viewed the data warehouse as a tool to contain all of the organization's data.

3. Inmon - top-down

Essential to this architecture is that the organization must decide on the naming, the definition, which data is valid if there are conflicts, and all other data cleaning operations on all of the data before it enters the warehouse.

4. Inmon - top-down

Additionally, this architecture stores data in the warehouse in a normalized form. Data normalization is a way of organizing data to reduce data redundancy and improve data quality.

5. Inmon - top-down

The data then moves to a department-focused data marts where end users and applications can query it.

6. Pros and cons of top-down

This approach has some advantages. First, conforming the input sources into a single definition that the organization agrees upon makes the data warehouse an effective source of truth. Furthermore, since the data is normalized within the warehouse, less storage is needed, and creating new data marts for reporting or analysis is relatively straightforward. However, because the data is normalized, it requires more joins for reporting leading to slower response times, which is one of the disadvantages of the top-down approach. Additionally, as you might imagine, gaining alignment by the organization on the data definitions can take a lot of upfront work resulting in a higher startup cost for warehouse projects.

7. Kimball - bottom-up

Ralph Kimball popularized the next popular data warehouse architecture, also known as the bottom-up approach. In this approach, once the data has been brought in, it is denormalized into a star schema. A star schema is a way of storing data that makes query writing fast and straightforward. In the Kimball approach, the focus is on getting from data to reporting as fast as possible. This is done by first organizing and defining the data definitions of one department of the organization, placing that data into a data mart, and making it available for reporting. After completing one department, a new department is chosen, and the cycle repeats.

8. Kimball - bottom-up

Various data attributes, such as name and location, connect the data marts. The data marts are then integrated into a data warehouse. It contrasts the top-down approach because the data moves to the data marts first and then to the warehouse.

9. Pros and cons of bottom-up

The advantages of the bottom-up approach include how fast it can get up and running by taking an incremental approach resulting in lower upfront costs on warehousing projects. Also, the denormalized data model makes the data easy to consume for users. However, denormalization increases the processing time within the ETL process and can create duplicate data when used across different data marts. For example, if the organization includes a field named revenue in two data marts, but in one, the revenue accounts for the cost of goods, while the other doesn't. Duplicate data can confuse and make the data warehouse less of a source of truth. An additional disadvantage is as the organization adds new departments or processes, more development will need to be done. The bottom-up approach has a lower upfront cost but requires more upkeep than the top-down approach. So when developing an architecture, it is essential to create one that works for our users while being aware of their strengths and weaknesses.

10. Let's practice!

Time to check our understanding of the top-down and bottom-up approaches.