1. Stacking DataFrames
In addition to what we have seen, pandas has some reshaping methods that are designed to work on DataFrames with multi-level indexes.
2. Row multi-indices
In this slide, we see a DataFrame with multi-level index on the rows. Why would we want to have them? A MultiIndex, also known as multi-level index, allows us to store and manipulate multidimensional data in simple DataFrames.
3. Setting the index
Let's start by learning how to create a MultiIndex. Imagine we have the following DataFrame. There are several ways to create a multi-level index.
4. Setting the index
The simplest one is to use the set_index() method. In the example code, we specify that we want the columns country and age to be set as row indices. We also set the inplace argument to True to change the original DataFrame directly. As a result, we get a DataFrame with a multi-level index on the rows.
5. MultiIndex from array
Another option is to use the method from_arrays() from MultiIndex. In this case, we define a list of lists named new_array. Each element represents one index.
We call the from_arrays() method passing new_array and a list of names we want for the indexes. We assign it to the original DataFrame index by calling the index attribute.
As a result, we get a DataFrame with two indices on the rows: "member" and "credit_card".
6. MultiIndex DataFrames
We could also define a DataFrame with multi-level indexes on the rows and the columns.
7. MultiIndex DataFrames
The process is very similar. We create two MultiIndexes using the method from_arrays(): one for the index and one for the columns. When we create the DataFrame, we set the index and the columns to be the recently created multi-level indexes. As a result, we get a DataFrame with multi-level indexes on the rows and on the columns.
8. The .stack() method
The stack() method will reshape the DataFrame with a multi-level index by converting it into a stacked form.
9. The .stack() method
In other words, stacking means rearranging the innermost column index to become the innermost row index.
10. Stack into a series
Let's take our DataFrame that had a multi-level index on the rows. We apply the stack() method. We have a simple column index. So stack will compress the last level in the DataFrame columns to produce a Series, as we can see in the output.
11. Stack into a DataFrame
Now, let's work with the patients data. This DataFrame has a multi-level index in the columns. We'll apply the stack() method. As a consequence, stack() will compress the last level in the columns to produce a DataFrame, as we see in the output.
12. Stack a level by number
It is also possible to choose which level to stack. In the example code, we want to stack the first column level, so we set the level argument to zero.
Now, the stacked level becomes the new lowest level in the row multi-level index. It's important to remember that if we don't set the level argument, stack() will move the last level by default.
13. Stack a level by name
Our DataFrame has named column levels, so we can specify the level to stack by passing in the column name. In the code, we set level to year. In the resulting DataFrame, we see that the year level has now become the innermost row level.
14. Let's practice!
Now, you know how to stack DataFrames. Let's practice!