Get Started

Exploratory data analysis in PivotTables

1. Exploratory data analysis in PivotTables

In this screencast, we’ll review how to perform exploratory data analysis, or EDA, using PivotTables. The first thing we should always do when starting EDA with a new dataset is to explore it. PivotTables are a great tool for summarizing data, which makes them useful in EDA. We can easily aggregate a wide range of variables. For example, let’s focus our exploration on Discounts. We can drag Discounts to the Values section, and it will automatically summarize it as a sum of discounts given. Let’s format this so it's easier to read. There are many ways to do this, but the way we’ll do this is to right-click on the value, open the Value Field Settings, then Number Format, and here we have a lot of customizations to choose from. Let’s format this as a number with a comma separator and zero decimal places. Then we’ll hit OK. Let’s also add an average of Discounts. It automatically creates another sum of discounts and titles this Sum of Discounts2, so let’s change the summarization to average, and then rename this to Average of Discounts and add our number formatting. Let’s try to find a relationship between Discounts and the number of units sold. First, add the Sum of Units Sold using the same steps here. One relationship we could calculate would be the ratio of Discounts per unit sold. But wait. That isn’t an option here within the PivotTable. Don’t worry! That is where Calculated Fields come in. Let’s click on the PivotTable Analyze ribbon, then Fields, Items, & Sets, and then Calculated Field. First, let’s set our name to Discount Per Unit. Then, in the formula, select Discounts, then the backslash to divide, then Units Sold. Click add. If we go to the dropdown menu here, we can see all the calculated fields we make. If we ever need to modify the calculations, we can select the field name, edit the formula, then click modify. Once we’re done, we can click OK. Let’s format this as a currency with two decimals. There is an average of $8.18 in discounts for each unit sold. Knowing this is interesting, but now I have another question, “Is this the same for each country?”. Let’s find out by adding Country to the rows. There seems to be an even bigger pattern here. Let's group these locations by continent. This isn’t a field that we have available, but we can easily create a custom group by holding Control and clicking on the Row Labels in the Pivot. Let’s click on Canada, Mexico, and the USA and then right-click, then group. Then, do the same for France and Germany. We’ll rename Group 1 to North America and Group 2 to Europe. Here the pattern is very clear: we offer higher discounts to companies in North America, especially in the USA. This is the beauty of PivotTables. We can easily analyze and split up data, which makes our exploration much easier! Now it’s your turn to give it a try.

2. Let's practice!