1. Star and snowflake schemas
Welcome back! Let's continue building on your knowledge from earlier. In this video, we'll take a look at snowflake schemas, an extension of the star schema.
2. Star schema
Remember from earlier that a star schema consists of one or more fact tables surrounded by dimension tables.
3. Snowflake schema
The snowflake schema is like a star schema, except it allows relationships between dimensions. In the example you can see that the Lender and Property dimensions each link to other dimension tables.
Note that fact tables remain the same.
4. A closer look
The biggest difference between the two styles is how they handle hierarchical data. Star dimensions tend to have all levels of a hierarchy in the same table, whereas with snowflake dimensions, hierarchy levels are explicitly broken out into multiple tables.
Here you can see a Product dimension table. Imagine that each product has a name. Each product also belongs to a product subcategory, which itself belongs to a product category.
In a star schema, all levels of the hierarchy, as well as their attributes, show up on the same product dimension. With a snowflake schema, each level of the hierarchy becomes its own table and we join those tables, usually with keys.
5. Comparison
When it comes to dimensional modeling theory, we strongly prefer star schemas over snowflake schemas. The key reason is that star schemas are easier for business users to understand. They don't want to worry about keys or hierarchies. The other benefit is that quite a few business intelligence tools have been optimized for the star schemas.
Although dimensional modeling theory leans heavily toward star schemas, we do see snowflake schemas in some data warehouses. The reason for this is that star schemas duplicate quite a bit of data, which leads to storage costs and can impact performance. Also, star schemas are not ideal for frequently-updated data, especially with large dimensions. Suppose you have millions of rows in a dimension containing a column for country. When a country name changes, you may have to update a large number of rows with the new country name. By contrast, with a snowflake schema this would be an update of a single row.
6. Stars and snowflakes in Power BI
When it comes to Power BI in particular, it is important to note that both schemas work. If you have a snowflake schema in an existing warehouse, you could import the data as-is, making migrations easier.
That said, Power BI does prefer star schemas for the same reason as dimensional modeling theory. Because it is easier for users to understand. Furthermore, there are some optimizations in Power BI which make performance much less of a concern.
7. The performance analyzer
Power BI has a built-in performance analyzer. When enabled, it keeps track of at least three key measures on each visual.
The first is how long it took to read the data from its internal database and then perform any DAX operations on the data.
The second component is figuring out how long it took the visual to render.
Finally, a third measure captures everything else, which is typically waiting time on other operations, including waiting for cross-filtering operations to complete.
8. Performance tuning advice
There are a number of ways you can improve performance. If the DAX query takes a long time to complete, you could tune your DAX operations or improve your data loading performance. This might include improving your data model!
If the big problem is in visual display, use less complicated visuals and show less information on the screen. Power BI needs to render each data point, so plotting tens of thousands of points may take a while.
If the Other value is the cause of your slowness, you might want to reduce the number of visuals on the page.
9. Let's practice!
Time to practice!