Get startedGet started for free

Grouping by multiple columns

1. Grouping by multiple columns

Now that you've recreated a basic spreadsheet pivot table in Python, let's take it a step further.

2. Fruit sales

If we look at fruit-underscore-sales again briefly, it's a dataset of transactions.

3. Fruit sales

It has 32 rows and 4 columns.

4. Fruit sales

And since many transactions could contain one kind of fruit, you'll notice how multiple rows contain Apples purchased at Pete's Discount Fruit. But what if we just needed a summary of each fruit purchased in each store? Fortunately in Python, we can group by both store and product name, then sum.

5. Adding a list of column names

Here's what the code looks like. We now pass what's called a list containing the column names we wish to group by instead of just a single column in quotations.

6. What is a list?

Lists are a part of Python's built-in functionality. Lists are containers for a collection of values you wish to keep together. Here, we have defined a variable, shopping-underscore-list, which contains three items, milk, eggs, and cheese. Each item is separated by a comma, and the entire list is encased in a set of brackets.

7. By store, by fruit

Let's work through this across multiple lines of code. On the top line here, we've defined a variable called groups. The groups variable contains a list of column names we wish to group by. Note that each column name is in quotations. On the next line, we've defined a variable, fruit-underscore-sales-underscore-less, which contains our familiar dot-groupby-dot-sum, but now the first argument of our groupby is the variable groups. The result is a condensed table where there is one line for each fruit purchased in each store.

8. By store, by fruit

Observe how there is only one row for Apple at Pete's Discount Fruit, and the overall size of our DataFrame has halved from 32 rows to 16 rows, all while preserving the important information in the dataset.

9. The benefits of grouping by more columns before .sum()

You may be asking yourself, what do you gain through grouping by more than one column before using dot-sum? First, it's simply good to know you have the option, and you're not beholden to either grouping by one column or not grouping at all. Second, it's a powerful technique for reducing your dataset down to the information you truly care about for an analysis. And lastly, it's a really great trick for helping those who are trying to process way too much data in a spreadsheet. If you can import their data, reduce their row count, and hand them back a smaller, more manageable file containing the most relevant information, you're going to make your spreadsheeting colleague very happy.

10. Your turn!

Now you can try grouping by multiple columns.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.