Pivot tables

1. Pivot tables

We've just seen the pivot method to reshape DataFrames. Now we are going to learn how to use another method, the pivot table.

2. Pivot method limitations

As we mentioned before, the pivot method has some limitations. Let's remember a previous example. The values for name and variable are the same. But the value for imperial system is different. So when we try to apply the pivot method, it fails.

3. Pivot method limitations

The pivot method is a great general purpose pivoting technique. However, it requires the index column pair to be unique. This is mainly due to the fact that pivot method can not aggregate values. What do we do in those cases, like the one from the previous example?

4. Pivot table

Pandas provides with another method; the pivot table method. A pivot table is a DataFrame that contains statistics that summarizes the data from a larger DataFrame.

5. Pivot table

To convert from this DataFrame in long format

6. Pivot table

to this DataFrame with aggregated values, we can use the pivot table method as you see on the slide. It's important to notice that with this method we can also summarize DataFrames that are not in long format. This method takes several arguments.

7. Pivot table

The index argument takes the name of the column we want to have as an index in the new pivoted DataFrame. And the columns argument takes the name of the column we want to have as each column in the new DataFrame. Similar to what we saw for the pivot method.

8. Pivot table

The values argument takes the name of the column which values we want to aggregate in the new pivoted DataFrame. In this case, we can also specify an aggregation function. If we omit this value, pandas will understand you want to take their average. But it is always a good practice to clarify which function we want.

9. Pivot table

So in our previous example, we could now apply the pivot table method. We set name as the index, variable as the columns, and we specify we want the mean. Now, the method works perfectly. For the row with repeated index column pair, we can observe the mean of the original values.

10. Hierarchical indexes

Another advantage of pivot tables is that we can have multi-level indexes. To clarify this, we'll work with the following dataset.

11. Hierarchical indexes

We'll apply the pivot table step by step. We set the column "movement" as columns.

12. Hierarchical indexes

Then, we pass a list of columns to the index: the columns "first" and "last".

13. Hierarchical indexes

Finally, we pass the columns "overall" and "attacking" as the values we want to aggregate. And we specify we want to get their maximum value. As we can observe in the slide, we get a pivoted DataFrame with multi-level index not only in the columns, but also in the rows: the indexes first and last.

14. Margins

Finally, we would like to get the number of attacking and overall scores each player has. In the pivot table method, we omit passing the values argument. This tells pandas to pivot all values.

15. Margins

But we'll pass the margins argument. When this parameter is set to True, all the columns and rows will be added. In our case, we'll get the total count for each row and column.

16. Pivot or pivot table?

How do we know if we should apply pivot or pivot table? Ask yourself. Does the DataFrame have more than one value for each index/column pair? Do I need a multi-level index or a summary statistics of a large DataFrame? If the answer to any of these questions is yes, then you should use pivot table.

17. Let's practice!

Now, it's time to use pivot tables.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.