How to calculate in Power BI and Python

1. How to calculate in Power BI and Python

We will be learning how to create the correlation coefficient in Power BI, then use Python. To do this, we will return to the dataset from the previous demo. This pair plot shows the pairwise relationship between “Mnt” variables (amount spent on product categories) and the “Income” variable. We can see some relationships look more like random blobs and some with more defined positive shapes. Let’s quantify these relationships with a correlation coefficient - creating this first in Power BI. Do this with a new quick measure; right-click on the dataset and choose “New quick measure”. Find the “Correlation coefficient” option in the “Calculation” dropdown menu. Add “CustomerID” to “Category” as this is the lowest grain of the data we want to find the correlation over. “Measure X” will be “MntRegularProds” and “Measure Y” will be “MntTotal”. Make sure they are using a “Sum” aggregation. Click OK. Power BI will generate the required DAX formula for the calculation; it is quite complex, so we will not cover it in length here. Let’s rename this to “correlation_regular_total”. Create a card visualization on this page using the new quick measure. The correlation coefficient is one, indicating a very strong positive relationship, which is also visible in the scatter plot with the tight linear shape moving up and to the right. This is a bit tedious to do for each pair of variables. A handy trick is to create a correlation heatmap which is super easy in Python. On a new page, we’ll create a new Python visual with all of the “Mnt” variables. Import the two packages - matplotlib dot pyplot and seaborn. First, we create a correlation matrix using the dot corr() method on our dataframe; save it as “corrMatrix”. Then, we pass this to the seaborn dot heatmap() function. We can set the “annot” parameter to True to label each correlation coefficient. Send the plot to render in Power BI with plt dot show(). Now you can see all of the pairwise relationships and correlations at once. Heatmaps are also useful for visualizing other types of relationships as well. Two common examples are analyzing the retention of a cohort of customers over multiple weeks or simply the amount of something over time and across categories. Let’s focus on the latter. On a new page, we’ll create a final python-based visualization showing the average total amount of items purchased across Age and whether the customer is single or not. Drag those columns into the Values section. As you know by now, import matplotlib dot pyplot as plt and seaborn as sns. For this type of heatmap, we need to reshape the data, specifically, create a pivot table. We can do this with the pandas dot pivot() method on our dataset DataFrame. The first two parameters are “index”, for example the rows of the pivot table, and “columns”. In this case, we will use marital_Single and Age, respectively. The final parameter is “values” - we will use the MntTotal variable. From here, we call the seaborn dot heatmap function, inputting our new data. Show our plot. At first glance, it looks relatively similar. However, the color does appear to be getting lighter or increasing in value as Age increases. This is generally true for both single and non-single customers. Awesome. Now it’s your turn to measure relationships quantitatively and build heatmaps !

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.