Get startedGet started for free

Set up, import, and transform data

1. Set up, import, and transform data

In this video, we will compare how to import data and apply simple transformation steps using Python, then Power BI. Fortunately, the steps to enable Python scripting were already completed for us. Let’s test out this implementation by importing our first dataset via Python. Click “Get data”. Find “Python script” under the available options. All scripts start with importing pandas, commonly as pd, and other packages you might need. I’ll be using a dataset of transactions from an online retailer. You’ll be using this dataset in future exercises, so we will cover it in more detail later. We will import the “purchases dot csv” data, using the pandas read_csv() function, and save it as “dataset”. End the script using the print function with the “dataset” dataframe - press ok, then the “Navigator” window will appear with the dataset available. Just select and press “Load”. It looks like it is loaded, as we can see the dataset and columns in the “Fields” pane. Let’s explore some basic data transformations first in Python. There is another CSV file with categorization for each product. Let’s join that to the first dataset. Click on “Transform data” in the “Home” ribbon to open Power Query. Under the existing “dataset” table, click on “Source” in the “Applied Steps” to open the Python script. We are going to rename “dataset” to “purchases”, then import the second CSV as “categories”. The pandas dot merge() function joins two DataFrames together. The first two arguments will be our DataFrames - purchases and categories. Next, set the “how” argument to “right”, since we want to understand the number of transactions per categorization, even if the StockCode value is not in the left dataset - in essence no transactions happened. Both datasets have “StockCode”. This is what we will use to perform our join. Set “on” to “StockCode”. Save as “py_combined”, then print the DataFrame. Click on the yellow “Table” text next to “py_combined”. If Power BI asks about replacing subsequent steps, click “Continue”. Great! We can see the complete dataset; the “Category” column is appended to the end. Now to do this in Power BI involves a couple of steps. Let’s start by first importing the two CSV files - purchases and categories. With the first one selected, click on the “Merge Queries” dropdown, then “Merge Queries as New” to join the two datasets together into a new dataset. Here, select the second dataset, then the columns to perform the join with - for example “StockCode”. Finally, select “Right” under “Join Kind”. Rename to “combined”. To get the columns from the second table, click on the expand button on the column created from the merge, then select the columns we want, specifically, “Category”. Python and Power BI can perform the same task through different means. This demo showed a simple example. As you mature as a data analyst, you will take on projects involving more complex data joins and transformations. Python offers a more effective way of tackling these tasks while keeping the visualization capabilities of Power BI. Now let’s get started with using Python in Power BI!

2. Let's practice!