Get startedGet started for free

Stacking DataFrames

1. Stacking DataFrames

In addition to what we have seen, pandas has some reshaping methods that are designed to work on DataFrames with multi-level indexes.

2. Row multi-indices

In this slide, we see a DataFrame with multi-level index on the rows. Why would we want to have them? A MultiIndex, also known as multi-level index, allows us to store and manipulate multidimensional data in simple DataFrames.

3. Setting the index

Let's start by learning how to create a MultiIndex. Imagine we have the following DataFrame. There are several ways to create a multi-level index.

4. Setting the index

The simplest one is to use the set_index() method. In the example code, we specify that we want the columns country and age to be set as row indices. We also set the inplace argument to True to change the original DataFrame directly. As a result, we get a DataFrame with a multi-level index on the rows.

5. MultiIndex from array

Another option is to use the method from_arrays() from MultiIndex. In this case, we define a list of lists named new_array. Each element represents one index. We call the from_arrays() method passing new_array and a list of names we want for the indexes. We assign it to the original DataFrame index by calling the index attribute. As a result, we get a DataFrame with two indices on the rows: "member" and "credit_card".

6. MultiIndex DataFrames

We could also define a DataFrame with multi-level indexes on the rows and the columns.

7. MultiIndex DataFrames

The process is very similar. We create two MultiIndexes using the method from_arrays(): one for the index and one for the columns. When we create the DataFrame, we set the index and the columns to be the recently created multi-level indexes. As a result, we get a DataFrame with multi-level indexes on the rows and on the columns.

8. The .stack() method

The stack() method will reshape the DataFrame with a multi-level index by converting it into a stacked form.

9. The .stack() method

In other words, stacking means rearranging the innermost column index to become the innermost row index.

10. Stack into a series

Let's take our DataFrame that had a multi-level index on the rows. We apply the stack() method. We have a simple column index. So stack will compress the last level in the DataFrame columns to produce a Series, as we can see in the output.

11. Stack into a DataFrame

Now, let's work with the patients data. This DataFrame has a multi-level index in the columns. We'll apply the stack() method. As a consequence, stack() will compress the last level in the columns to produce a DataFrame, as we see in the output.

12. Stack a level by number

It is also possible to choose which level to stack. In the example code, we want to stack the first column level, so we set the level argument to zero. Now, the stacked level becomes the new lowest level in the row multi-level index. It's important to remember that if we don't set the level argument, stack() will move the last level by default.

13. Stack a level by name

Our DataFrame has named column levels, so we can specify the level to stack by passing in the column name. In the code, we set level to year. In the resulting DataFrame, we see that the year level has now become the innermost row level.

14. Let's practice!

Now, you know how to stack DataFrames. Let's practice!