Pivoting and unpivoting
1. Pivoting and unpivoting
In our final video, we'll learn to reshape data by pivoting and unpivoting DataFrames.2. Long vs. wide format
Data can come in different formats which we can think of as long or wide format. In long format, each row is a single observation. In this example, each row is a count of a specific bedroom type per property. In wide format, rows contain multiple observations across columns. In this case, we have the bedroom counts by type in its own column. We may need to reshape our data from one format to another for different tasks, such as visualizations.3. Long vs. wide format
Let's see how to transform from long to wide format with pivot.4. Pivoting from long to wide
To transform from long to wide format, we use the .pivot() method. This reshapes data so values from one column become new column names.5. Pivoting from long to wide
The "on" parameter in .pivot specifies which column's values will become our new column names - in this case, "bedroom_type".6. Pivoting from long to wide
The "index" parameter defines which column identifies unique rows in our output - here we use "name".7. Pivoting from long to wide
The "values" parameter indicates which column contains the values for our table cells - in this case, "count". Our result is now in wide format. Each row now represents one property, each bedroom type is a column, and the counts fill the cells. This makes comparing bedroom types across properties much easier.8. Pivoting with aggregations
What if multiple values exist for the same index and column combination? Imagine that we had one row per bedroom for each property. Here we see that the Waves property has two rows where the bedroom_type is single.9. Pivoting with aggregations
In this case, we add the "aggregate_function" parameter in .pivot to combine multiple values. Common options include "sum", "mean", "min" and "max". Here we use "sum" to add all bedroom counts for each property and type. Notice, for example, that Waves has 2 single beds in the output.10. Unpivoting from wide to long
Now we do the reverse operation which converts wide data back to long format, suitable for visualizations.11. Unpivoting from wide to long
Polars provides the .unpivot method for this operation. In Pandas, this method is called .melt.12. Unpivoting from wide to long
The "index" parameter in unpivot specifies which columns remain unchanged - here the name column stays as our identifier.13. Unpivoting from wide to long
The "on" parameter lists which columns to convert into rows - here we're unpivoting the "doubles" and "singles" columns. We then get the output in long format. The variable column shows which column the data came from and the value column shows the data.14. Unpivoting from wide to long
We can also choose to set the name of the variable and value columns in the output with the variable_name and value_name arguments to unpivot. Here we set them to bedroom_type and count respectively.15. Let's practice!
Now it's time for you to practice reshaping your data.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.