1. Reshaping using pivot method
In the previous video, we learned what reshaping a DataFrame means.
Now, we'll start learning how to reshape DataFrames using the pivot method.
2. From long to wide
The long format is usually the most suitable to store a clean dataset.
However, we could want to demonstrate the relationship between columns, or do time series operations with the variables, or any operation that requires columns to be the unique variable.
These are situations where we need another format.
3. From long to wide
So, we need a way to convert a dataset from long format, like the one you see in the slide
4. Pivot method
to a format where we can discover patterns.
The pivot method allows us to reshape the data from a long to a wide format.
5. Pivot method
It takes three arguments: index, columns, and values.
Let's see how they work.
6. Pivot method
The index argument takes the name of the column we want to have as an index in the new pivoted DataFrame. In our case, we want the column Year to be our new index.
7. Pivot method
The columns argument takes the name of the column we want to have as each column in the new DataFrame. In our case, this is the column Name.
8. Pivot method
Finally, the values argument takes the name of the column which values we want to populate the new pivoted DataFrame.
In our case, we want the weight column. So the pivot method will transfer each weight value from the original DataFrame to the new DataFrame, where its row and column match the year and name of the original DataFrame.
9. Pivot method
If the method can not find a row and columns matching the original DataFrame, it will set that cell value as a missing value.
10. Pivoting a dataset
As an example, we'll have the following long DataFrame with data about FIFA players' weight and height in metric and imperial systems.
11. Pivoting a dataset
if we apply the pivot method, setting the index argument to the column name,
12. Pivoting a dataset
the columns argument to the column variable,
13. Pivoting a dataset
and the values argument to the metric system column, we obtain the following pivoted DataFrame.
Height and weight are now the columns and the player name is the index of the DataFrame.
14. Pivoting multiple columns
We could also pass a list of two values to the pivot method.
In this case, the resulting DataFrame have a hierarchical column index with both column names as we see in the example.
15. Pivoting multiple columns
What if we want to extend the pivot method to all the column values in the DataFrame instead of just one or two.
We can do this easily is by omitting the values argument.
16. Pivoting multiple columns
So we'll apply the pivot method to the previous fifa DataFrame, omitting the values parameter.
We see in the code that we obtain the same result as before. The new DataFrame has the metric and imperial system values. Also, it has a hierarchical column index to distinguish both cases.
17. Duplicate entries error
Passing only index and columns arguments to the pivot method will work in most of the cases.
But let's see the following example.
18. Duplicate entries error
Pay attention to the third and fifth row. The values for name and variable are the same. But the value for imperial system is different.
19. Duplicate entries error
We apply pivot method passing only index and columns. Because it doesn't know which of the two values should be the corresponding value, pandas will raise an error.
We can choose to delete one of the rows, for example, the fifth row. We can use the drop() method passing the index and setting axis to zero. We now get a pivoted DataFrame because no repeated value is found.
20. Let's practice!
And now, let's put the pivot method in action.