1. Evaluating performance
In this demo, we’ll see how to create a snowflake schema and learn how to use the performance analyzer.
Here in the model view, you can see the star schema I created earlier. If we take a closer look at the Geography dimension, we can see that it is made up of hierarchical data. Country first, then State. One of the main differences between star and snowflake schemas is how they handle hierarchical data. Since in the snowflake schema dimensions can be connected to other dimensions, we can split out the state and country information. Let’s do exactly that!
I’ll go into Power Query by clicking Transform data. I’ll duplicate the geography table twice and rename the new tables Country and State. Let’s think about how we want to connect to the fact table. The most detailed level of geography is State, and that is also the id that is present in the fact table. So we’ll connect the State table to Establishment Survey. We’ll also need to connect the State table to the Country table, which means we need to keep the Country id as well. But the actual country value can be removed here. I’ll right-click and select Remove. Next, we’ll look at the Country table. This one should contain only country information, so I’ll remove all state data. Let’s also remove the duplicate rows. Looks like there is only one country in our data at the moment. Note that if the country value would have to be updated, we only have to do it in one place in this snowflake schema, whereas it would be a lot more cumbersome to do this in the star schema from before. Let’s close and apply and take a look at the data model.
Note that I’ve kept the Geography star dimension as well, so we can easily compare performance later on. In a real-life situation, you would use only one of the two approaches. You can see that Power BI has automatically created relationships between the star dimension and the snowflake dimensions. This is not right. Let’s remove them. I’ll right click on the line and select Delete. Let’s do the same for the other table. There we go! So these are our snowflake dimensions. Now I’m going to create a relationship between the fact and State tables. I could drag the column names on top of eachother or use the Manage relationships button here. I’ll select New and choose the two tables. You can see that Power BI has recognized State id as the key, which is correct. Click OK and close. Lastly, you can see that the relationship between State and Country is a dotted line, which means it’s inactive. Let’s quickly change that by going into Properties. There we go. That’s our snowflake schema. And this would be our star schema.
I’m going to go into the Report view and create two visualizations. First, I’ll create a bar chart with the number of establishments by state, using the field from the star dimension. Then, I’ll do the same but use the snowflake dimension for the state. Let’s quickly rename these visuals to star schema and snowflake schema.
Now, we can compare the performance. I’ll go to View here at the top and click on Performance analyzer. This pops out the Performance analyzer pane. I’ll start recording and then refresh the visuals. This will show me the time it takes to load the two visuals on the page in milliseconds. If I refresh a couple of times, we can see that one schema isn’t consistently faster than the other. With our small dataset everything loads pretty quickly, but if you’re working with thousands or millions of rows, the star schema should be more efficient. This is because star schemas will only join the fact table with the dimension tables, leading to faster execution times.
That’s it, time for some exercises!
2. Let's practice!