1. Granularity, measures, and hierarchies
Let's talk about granularity, measures, and hierarchies.
2. Understanding granularity
Granularity refers to the level at which data is stored with respect to dimensions. This is the minimum level of detail we can query--we can aggregate data to higher levels, but won't be able to break it down to lower levels.
You can get a good understanding of the grain of data by listening for the word "by" when describing a table. You might define a table by customer, by product, by day, so the granularity is that each row represents the combination of one customer, one product, and one day.
In the example here, the granularity is by ID, by NAICS code, by establishment age, by year. Note that fields like number of firms are not counted as part of the grain: number of firms are measured based on the combination of dimensions.
3. Handling granularity in Power BI
If you have data at the annual level for example, there's no great way to break that data down by day. In some circumstances, you might use an allocation rule to spread values across the individual days, but those are estimations and introduce known inaccuracy to your data.
By contrast, going from a finer grain to a coarser grain is easy to do with Power BI, by aggregating and grouping.
In Power BI, you can specify aggregations and how to summarize data.
Grouping is the more detailed process, allowing you to specify the grouping of columns as well as specific aggregations.
Both methods have two key benefits. First, with fewer rows, we can get better query performance when displaying visuals. Second, by storing fewer rows, we reduce the amount of memory needed to store this data which can improve refresh time.
4. Measures
Measures are fields, or combinations of fields, which can be aggregated or calculated. They make up the majority of fields in fact tables.
Measures can be gathered directly from the raw, imported data, such as the fields from this fact table. They can be aggregated using sum, average, or count for example.
New measures can also be calculated from raw data to provide more insights.
5. Creating measures
When you bring a numeric value into Power BI, it is automatically aggregated using the sum. You can change the summarization to other aggregation methods, or use DAX.
DAX is a powerful tool to create aggregations or new measures, and is covered in-depth in other Power BI courses, although we'll see some in this lesson.
If you're new to DAX, Power BI has a dialog, called Quick measures, to let you create specific types of measures very easily. Most quick measures let you choose a calculation, base it on a value, and group the data by some other value. Each quick measure has its own template, which is great for learning how to create moderately complex measures.
6. Hierarchies
Lastly, let's talk about hierarchies. Hierarchies allow users to drill down into the data dimensions. There are two types of hierarchies.
Natural hierarchies fit clearly in the real world. A classic example of this is in the date dimension: each year is made up of months, which are made up of days. Each level fits cleanly into the level above. By contrast, week is not part of this hierarchy, as a week may span across two months or two years.
Artificial hierarchies exist only for the purpose of querying. For example, the intake year, favorite color, and favorite sport of students don't flow naturally from one another, but if users always drill down from intake year into favorite color and then favorite sport, this hierarchy makes sense.
7. Let's practice!
Let's test your knowledge on hierarchies before moving to Power BI.