1. Column Transformations
In this chapter, we will delve into column transformation principles and introduce some of the many options available in Excel Power Query.
2. Overview
Often, you will find that your raw source data needs to be developed further to prepare it for analysis.
This is when column transformation becomes the next necessary step in your data workflow. We can think of column transformation as modifying data in some way to meet your data analysis requirements.
3. Column transformations in the ETL journey
Revisiting our journey along the ETL path, we can consider this step of column transformation to live within the Data Cleaning and Preparation phase.
4. Column transformations in the ETL journey
In this step, we will explore transformations, including transforming new columns.
5. Column transformations in the ETL journey
And in this chapter we will continue our utilization of data visualization techniques in the Excel Workbook.
6. When is column transformation needed?
So, how do we know when column transformation is needed? It depends largely on the characteristics of the data source and the ultimate goals of the eventual analysis.
Primarily, column transformations are necessary when some type of standardization is required, for instance, adjusting numeric units or conforming text to a specific number of characters.
7. When is column transformation needed?
Data enrichment might come from deriving certain values from an existing column. This might provide important additional context needed for downstream analysis.
8. When is column transformation needed?
Often, a calculation is necessary to produce values to meet analysis goals. This might be a combination of two columns to derive an insight.
9. When is column transformation needed?
Lastly, storage optimization is a consideration, particularly for larger datasets. Transformations might be required to optimize data for faster processing, storage, and retrieval.
10. Transformation types
Many different types of transformations can be done to column values.
The main categories we will cover include numeric, date/time, and text.
11. Numeric transformations
Numeric transformations might be as simple as rounding the value to the nearest whole or deriving whether a value is even or odd.
Transformations might also involve the calculation of column statistics—for instance, the sum or count of the values.
Often, arithmetic operations are required. This can include adding or multiplying two columns together or between one column and a constant. Also, deriving percentages via division is a common numeric transformation.
12. Date/time transformations
Date/time transformations can combine date and time fields to create a date/time stamp.
Extracting certain components from a date or time can also be useful, such as deriving a month's name from a date.
Rounding operations might include deriving the start of an hour from the time.
Lastly, date arithmetic is often useful. For instance, calculating age based on two dates or duration in minutes between time fields.
13. Text transformations
A common text transformation is to split column contents, for instance, extracting first or last characters from a text string.
Converting text cases is often needed when standardizing data with a mix of upper and lower cases.
Sometimes, text fields require concatenation to combine their contents.
Text length might also be needed, for instance, to derive the number of characters in a text field.
14. Column transformations in Excel Power Query
So, how do we perform these transformations in Excel power query? Thankfully, there are many built-in no-code options that we will explore.
An important concept to master is when to "Add Column" versus "Transform" when conducting a transformation in Excel Power Query.
When we "Add Column", we introduce new columns based on existing data while maintaining the original column as-is. In contrast, "Transform" modifies and replaces the content of existing columns.
The choice between which approach to take will largely depend on your specific data workflow and analysis needs.
15. Common misconceptions
Let's recap some common misconceptions when it comes to column transformations.
First - every column doesn't need transformation! Transformations should be applied strategically to align with data and analysis requirements. Applying unnecessary transformations can actually introduce errors and reduce data integrity.
Remember that in tools like Excel Power Query, you can undo or modify transformations easily through the applied steps.
Furthermore, when applying transformations, even via "transform", you never modify the source data directly. The source data is always intact.
Lastly, many common column transformations can be done without the need for complex custom calculations or M-Language code. This is what makes Excel Power Query a great tool for learning the basics.
16. Let's practice!
Let's practice!