1. Introduction to Data Shaping and Grouping
This final chapter introduces you to some data transformation capabilities for correctly shaping and grouping data within the Power Query editor.
2. Shaping and Grouping Data
So why do we need to worry about shaping and re-grouping data in Power Query? We already did some of this in the Excel workbook using Pivot Tables.
Well, often in an analytics workflow, data may not be in the ideal format for the desired analysis, and when that is the case, these transformations need to be done more upstream.
Shaping and re-shaping transformations involve restructuring the data, such as transposing columns into rows or vice versa.
Grouping or Aggregation of data might be needed when data is too granular, and summarization is required for higher-level insights and summary statistics.
3. ETL
Let's revisit our overall ETL schematic to see where shaping and aggregation fit in as the focus of this final chapter.
4. ETL
As we can see here, reshaping and data aggregation unsurprisingly fall in the transformation part of our ETL workflow within the Data Transformation step. We will review use cases for both transformations and how the functions are performed in Excel Power Query.
5. ETL
We will also take a peek at the Query Dependencies Viewer as we build on queries in the editor.
6. ETL
Similar to previous chapters, we will continue to work with query loads in our Excel Workbook for further analysis and visualization.
7. Transpose
Let's first take a look at the transpose operation. This data transformation reshapes rows and columns in a table, effectively swapping them.
Therefore, each column in the original table becomes a row in the transposed table, and each row becomes a column.
For example, in this sample table, the section column marked in purple becomes a row, along with the patient data in green and blue.
This operation is useful when you want to re-organize data, especially when you have data in a horizontal format that you want to convert into a vertical format.
8. Group By
Next, the Group By operation is used to group data based on one or more selected columns and apply aggregation functions to the grouped data to create summary information.
9. Group By
Let's say we want to take this original table and aggregate Section A and Section B by calculating a sum for each.
Using the group by operation with SUM gives us a summary table that retains section values in rows and patient values in columns but adds a summary total for each section.
Group By is useful for aggregating data within the existing table, which is particularly handy for creating summary statistics for specific groups or categories within your data.
In this case, grouping by 'Section' has given us an interesting insight into Patient 1, who had a much higher score in Section A than in Section B.
10. Pivot
Last is the pivot operation. Pivoting a table ultimately re-shapes the data to create a cross-tabulation by rotating rows into columns.
Back to our original sample table here, we can pivot the purple column "Section" so each of its values (A and B) becomes its own column, and Patient1 and Patient2 are flipped to rows.
11. Pivot
The pivot operation then allows you to aggregate data for each group.
In this case, performing a SUM of values for Patient 1 and Patient 2 for each section.
Pivoting data can be useful in creating summary tables or crosstabs.
12. Transpose vs. Pivot vs. Group By
So how will we know when to perform these operations, whether to transpose, pivot, or group-by?
We can first consider the ultimate goals of the downstream analysis.
13. Transpose vs. Pivot vs. Group By
If reshaping (for instance, rows to columns or columns to rows) will be necessary to align the data for analysis, you might consider performing a transpose operation.
14. Transpose vs. Pivot vs. Group By
Will aggregation be necessary? For instance, do we want to avoid loading granular data in favor of higher-level grouped data? Then consider performing a Group By operation.
15. Transpose vs. Pivot vs. Group By
Otherwise, there might be instances where both reshaping and aggregation are required.
16. Transpose vs. Pivot vs. Group By
In this case, you might be required to pivot the data to create the necessary cross-tabulation to produce your desired aggregation.
17. Let's practice!
Let's get practicing!