Get startedGet started for free

Introduction to Power Query

1. Introduction to Power Query

Hello! My name is Khaled Choucri and I am pleased to welcome you to this Power BI course. Through these lessons we will learn the different ways we can use to transform our data to prepare it for analysis.

2. Why prepare data?

A core concept of good data analysis is that your output will only be as good as the dataset that you use as an input. The saying goes something like: "Garbage in, garbage out". In order to provide high quality analysis, you must first make sure your data is of high quality and "clean". That way, you will reach a meaningful conclusion about your data and won't waste time by working on dirty data.

3. What is clean data?

So how do we define "clean" data? There are some important characteristics of clean data that we must strive to attain before conducting analysis. Namely: - We want our data to be free from any missing values, sometimes called nulls. - We want our data to have no typos or data entry errors. - We should remove duplicate entries from our dataset to prevent skewing statistics. - We should focus only on the data that is relevant to us, removing any data which will not help us answer our questions. - Outliers should be treated with care, not necessarily removed from our data as they can be important sources of information. We can apply a floor or a ceiling to the outlier values so that they do not skew our statistics too much. This means that we "cap" the outliers at a certain value. - Columns should have the correct data types to allow us to perform calculations on them. If a numerical column is stored as a text, we won't be able to make measures once the data is loaded in Power BI. - And finally, column and table names should be short & descriptive so we don't confuse our report readers and ourselves. Try to make sure your data doesn't look like this table here before proceeding to analysis.

4. How does data preparation work in Power Query?

So how exactly do we clean our data in Power BI? In order to do that, we use a tool called Power Query. This is the data manipulation framework that is present in Power BI which allows us to connect to or load our data, as well as transform it according to our needs. We then apply a series of transformations to our data, one after the other, so we can fix errors and clean it up. This sequence is shown on the right of the power query interface, under applied steps. You can consider this to be a "recipe" for how we are transforming our data, and this recipe is saved in your "query" which will ultimately load your data for analysis once we close the power query editor and apply those steps.

5. Data types in Power Query

There are 5 different data types that are native to Power Query: - Numbers - Date/Time - Text - Logical Data - Binary Data, which is encoded in base64 and most often used to store images. We should always make sure to select the correct data type for each column in our dataset. Power BI will also usually automatically detect the data type of each column and set it for you.

6. Basic structural transformation of data

Transforming our data structure means we won't change the data stored in our tables, only how it is organized. An example of this is promoting headers, which means to make the first row of the data into column headers. This can be very useful with Excel files. We can also change the order and sort our columns as well as remove any redundant columns or rows from our data. Finally, we can rename our columns to be more descriptive.

7. Let's practice!

Let's try this out ourselves, using the Adventure Works dataset.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.