Data warehouse design example
1. Data warehouse design example
Welcome back! This time we will walk through a data warehouse design example.2. Let's set the stage
Let's give context for our hypothetical example. We work for a new company, a startup, that has developed a popular photo-sharing app. Photographers use the platform to sell their photos. Our role is to design and deploy a data warehouse for the company.3. Top-down, or bottoms up approach?
As a startup, we must show business results quickly. There are many strengths of a top-down approach, but the implementation time tends to be longer versus a bottom-up approach. Therefore we decide on a bottom-up approach. Additionally, the sales group was flagged in our requirements gathering as the priority for the data warehouse.4. Kimball - select the organizational process (step 1)
Next, we need to start modeling our data. Since we chose a bottom-up approach, we will likely use a snowflake or star schema, and we can use Kimball's four-step process to design the tables. In step one, we need to select the organizational process. Our requirements gathering uncovered that the sales group is trying to understand better what type of customers purchase photos from the app. In addition, the sales group would like to analyze the data to prioritize which customers they reach out to. Therefore, we will focus on building a fact and dimension table that tracks customer purchases.5. Kimball - Declare the grain (Step 2)
In step two, we declare the data grain. In this case, tracking each purchase by a customer as a row in the fact table will provide the most flexibility. Additionally, it is the lowest level of detail for purchases.6. Kimball - Identify the dimensions (Step 3)
Next, we identify the dimensions we will track. Again we are trying to prioritize customers. We think about how other users describe the data that results from the sales group prioritizing customers. Some important features are a customer's location, joining date, and default payment method. Additionally, we would have a dimension table for photos, and dates, that we won't detail here.7. Kimball - Identify the facts (Step 4)
In step four, we want to identify the facts. To determine this, we ask ourselves what we are answering. Here we decided to track how much time was spent viewing a photo, which photo was purchased, and the cost of the picture. These features will give insight into engaged customers that we expect to buy more.8. Fact and dimensions tables
After finishing the four steps, we have our fact and dimension tables. In our simplified example, we have only a handful of features. Typically, there would be many more. We now have a star schema data model and can move to other planning areas.9. On-premise or cloud implementation
Another decision we need to make is whether to implement our data warehouse on-premise or in the cloud. As a startup, we are limited in how much we can spend on infrastructure upfront, and we tend to have small teams. Therefore, we chose a cloud implementation so our team can focus on high-value activities.10. ETL or ELT implementation
Finally, we must decide if we use an ETL or ELT process. The organization prioritizes keeping as much data as possible because it may have future value. Additionally, since we have a cloud implementation, we can scale the resources as needed, saving on costs. Therefore, we decide on an ELT implementation. We have completed many critical planning decisions and need to focus on executing the implementation and deploying the warehouse. We would repeat this cycle for another department.11. Summary
In summary, the planning phase for designing a data warehouse is critical, where we can evaluate the impact of our different decisions. But, more importantly, understanding the strengths of each approach will allow us to tailor the data warehouse to our needs.12. Let's practice!
Time for practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.