1. Reshaping and aggregating data
Welcome to this Power BI course where you'll learn about transforming data in Power BI. My name is Khaled Choucri, and in this lesson we will focus on transforming the shape and structure of your tables, as well as aggregating the data in your columns.
2. Understanding dataset shapes
An important concept to understand is that the same information can be represented in different ways through different dataset shapes. When we say shape, we are referring to how data is organized within the table.
There are two main structures, or shapes, a dataset can take: Long, or Wide.
A long dataset has repeating categorical/id values in the first column and usually only one column containing a numerical value. This structure is easier for a computer to interpret and work with.
Conversely, a wide dataset contains only unique values in the first column, and multiple numerical columns, one for each variable. This structure is more human-readable and easier to analyze at a glance, as it represents a summarization of the long data structure.
We can see that these two tables contain the same exact information, but are organized differently. With Power Query, you can change the shape of your data by using certain transformations.
3. Transforming data shape
There are two transformations in Power Query that allow us to alter the structure of our dataset from long to wide and vice versa. These transformations are called pivot and unpivot.
Pivoting a table allows you to construct new columns in your dataset based on the values contained in a selected column. In this graphic, we can see that the "Variable" column was pivoted, which created three new columns: "Points", "Assists", and "Rebounds". This changes the structure of the dataset from long to wide. You may be familiar with this transformation if you have used Pivot Tables in Excel before.
The inverse of the pivot transformation; unpivot, allows us to select one or more columns to "flatten", which transforms the selected columns into a category-value pair. This transformation will essentially transform your dataset from a wide structure to a long one. In this example, the "Points", "Assists", and "Rebounds" columns were unpivoted, which resulted in the creation of two columns, "Variable" and "Value".
4. Transforming data shape
Another Power Query transformation that reshapes your data is the Transpose table transformation. This transformation switches the rows in your dataset with the columns and the columns with your rows.
This transformation can come in handy when you wish to change the orientation of your dataset without necessarily doing any aggregations on the underlying data as you would normally do with the pivot transformation.
If you apply the transpose transformation twice, you will always arrive at your original dataset.
5. Aggregating with group by transformation
There are times when you will want to work with a summarized version of your dataset by changing the granularity of your dataset. To do this, we use the group by transformation in Power Query.
When we use group by, we define one or more columns that contain our desired grouping hierarchy and then apply an aggregation function on the relevant columns to compute statistics for those groups.
There are many aggregations that can be applied, not just sum. You can calculate the average, median, minimum, maximum, or count of rows in that grouping. You can even apply custom aggregation functions, which we will learn in the last chapter of this course.
A good example is if we are interested in total and average weekly sales from a dataset containing daily sales, so we select the week column to "group" our data by, then aggregate the sales by using the sum aggregation as well as the average aggregation.
6. Let's practice!
Let's make use of these transformations to reshape and aggregate sales data from the Northwind Traders dataset.