Get startedGet started for free

Reshaping and combining data

1. Reshaping and combining data

We have been reshaping DataFrames using methods such as stack and unstack. Now, we'll learn how to combine them with grouping and statistical functions.

2. Reshaping and statistical functions

Let's imagine we have the following DataFrame that contains data about sales in two countries in two different years. The DataFrame contains multi-level indices, country and year. We could perform some statistical operations on this DataFrame.

3. Statistical functions

Such as calculating the sum, mean, median or difference between columns.

4. Stacking and stats

To obtain the total amount of online and onsite sales by year in the two countries, we chain the stack and the sum functions and apply it to the sales DataFrame as you see in the code. We set axis to 1 to apply it over the column axis. The stack method returns a DataFrame with a new inner-most index level, the shop label. The sum method gives the total amount of products sold.

5. Stacking and stats

To turn the shop level of row index into a column index, we could chain the previous sequence to the unstack method. This will yield the same results as before but in a different format, a wider format. Now, online and onsite are column labels of the DataFrame.

6. Unstacking and stats

To obtain the mean amount of products sold for each year in both countries, we can unstack the first level of sales and then, apply the function mean. Again, we set axis to 1 as an argument to specify the column axis. As a result, we get the following DataFrame. We can observe that year is the index, and the column is the mean of all products sold in both countries.

7. Unstacking and stats

Now, we'll get the difference in the amount of office supplies sold in 2018 compared to 2017. First, we select the office supply column of sales. Then, we unstack the country level.

8. Unstacking and stats

After that, we apply the function diff. This function finds the difference between columns or rows. We set the axis parameter to 1 to apply it over the columns. Periods is set to two, so it calculates the difference between a column and every other column. The output is a DataFrame with values representing the difference between the first and third column, and the second and fourth column as seen in the code.

9. Reshaping and grouping

The stack and unstack functions can also be used with grouping functions such as groupby. We want the total amount of different products by online or onsite regardless of the country. First, we will stack the DataFrame as you can see in the code. This will result in the shop column being the innermost level of the index. You can observe it in the first four rows of the resulting DataFrame.

10. Reshaping and grouping

After that, we group the data by the type of shop. We set the level parameter to the desired column name, shop. Then, we chain it with the sum function. As a result, we will get the total amount of office supply or technology purchased online or onsite as shown.

11. Reshaping after grouping

When grouping columns, the resulting DataFrame will have the grouping factor as the row index. In the code, we group the data by year and then, calculate the median. The output is a DataFrame where year is now the row index.

12. Reshaping after grouping

We can chain these operations to the stack and unstack function to reshape the data. We stack the first and second level, and then unstack the year level, we get a DataFrame containing the mean amount of products sold by year where shop and product are now the row indexes.

13. Let's practice!

Now, it's your turn to put these concepts into action!