Get startedGet started for free

Numerical transformations in Power Query

1. Numerical transformations in Power Query

Transforming numerical data is a crucial skill to learn for data analysis. We will learn how to use Power Query's features to fix errors as well as improve our numerical columns to produce higher quality analysis.

2. Why should you clean data?

Earlier in 2016, a Harvard Business Review article cited an estimate from IBM that bad data costs the US around 3.1 trillion dollars a year! That is a staggeringly large number representing around one-sixth of the United State's GDP at the time. The article went to explain that stale and unverified data was being used to make huge economic decisions which ended up backfiring and caused companies to take dramatic measures to recover those losses. A good rule of thumb to follow is the 1-10-100 rule. It costs a data analyst $1 to verify that a single row of data is valid. It costs $10 for that data analyst to clean that row if any missing or incorrect data is found. Finally, it costs $100 if you leave the data as is and do nothing. Imagine how many rows of data a single company has, and you can imagine how that $3.1 trillion figure is not so unrealistic.

3. What is clean numerical data?

Most of your analysis will depend on the contents of your numerical data columns, and basing your results on unclean numbers could lead to catastrophe. Ideally, you want your numerical columns to be completely free from missing values and errors, so that measures we build using DAX will function correctly. Outliers will also affect your analysis, but they will require more advanced tools which we will learn later. You can use a variety of mathematical transformations on your numerical columns if you had a good reason to apply them. The most useful of these transformations will likely be the absolute value transformation, which gets rid of all negative values in a column. Applying logarithmic transformations to a column is a way to translate an exponential relationship into a linear one. This is an advanced transformation that will generally only be applied if you have a good statistical or mathematical basis for using it. Sometimes you will want to multiply by or add a certain value to all your observations, this can also be accomplished with Power Query's numerical transformations. As a final note, you can save memory and make your columns more readable by rounding your data in Power Query to reduce the amount of decimal places. This is especially recommended if you have many rows in your dataset.

4. On date columns

Date and time columns are special types of numerical columns, as the standard numerical transformations cannot be applied on them. However, there are a series of transformations dedicated especially to them and will come in handy when you are preparing a date dimension to analyze your dataset. You can extract specific properties of a date column like the year and month, as well as derived properties such as the start or end of a particular year. This can be very useful for your analysis if you need to calculate averages based on how many days have elapsed in the year. You can even extract the age from a date column, which is typically applied when you have the birth date of employees listed in your dataset and need to figure out how old they are. We encourage you to experiment with the options available to you in Power Query when it comes to date columns, as they are quite extensive and varied.

5. Let's practice!

Let's try this out ourselves by working on some sales data for Adventure Works!