Get startedGet started for free

Transforming text in Power Query

1. Transforming text in Power Query

Hi there! My name is Maarten Van den Broeck, and I'll be your guide for this part of the course. This chapter is all about transforming the data in your columns, whether it is numerical or text data. First, we will focus on text data.

2. What is clean text data?

You will often need to clean the text columns in your data before analysis as they are prone to data entry errors as well as inconsistencies. Firstly, you should make sure your text columns don't have any obvious typos or incorrect values in them. This can be a little time-consuming to fix in Power Query so it is better that you fix these errors in the source file if possible. Next, your data should be consistently formatted, meaning that you should not represent the same data point in more than one way, such as having the country column in your data set have both USA as well as United States in it, since these are both referring to the same country. Likewise, make sure that you are using the same kind of capitalization for all the data points. This will prevent confusion and make the overall experience for you and your report users better. You should also remove all blank space that appears before and after your data, which sometimes appears when loading data from certain sources. The same can be said for punctuation and control characters which are particularly common when loading data from older data sources and from the web. Finally, a general paradigm to follow is that each column should only represent one piece of information, so you might need to split or combine columns in your dataset to achieve this.

3. How to clean text data?

Luckily for us, Power Query has many features dedicated to efficiently cleaning text columns, saving us a great deal of time and effort. These features can be found in the "Transform" ribbon, in the "Text" section. Clicking on "Format" gives you different options for capitalization as well as access to two of the most useful transformations when it comes to text data - Trim & Clean. These two transformations should generally be applied to every single text column in your dataset, since they can only positively affect your data. "Trim" will get rid of all trailing and leading whitespace, extra blank space before and after your data. If there is no trailing or leading whitespace, the data is left as is. "Clean", on the other hand, will remove all control characters such as new lines or carriage returns from your data. Although this is less common than whitespace, it can still happen and affects how your data appears when it is loaded into Power BI.

4. Splitting and combining columns

An important concept that is particularly relevant to text columns is one related to the specificity of data in your columns. To help us reach the optimal structure for analysis, we are sometimes required to split or merge columns in our dataset. When your data has columns which contain too many pieces of distinct information, it can become hard to analyze and make use of those columns. Therefore it is almost always better to split that column into two or more columns that each contain a distinct piece of information. A good example of this is splitting an "Address" field into Building, Street, City, Country columns. Sometimes, two or more columns in your dataset are more useful when they are combined. This is is not a general rule but will depend more on your analysis needs. A good example is combining First Name and Last Name into a single Name column.

5. Let's practice!

Let's try this out on our dataset and learn how useful the text transformations in Power Query can be.