1. Wide to long function
In addition to melt, there is another function that can help us transform the data from wide to long, the wide_to_long function.
2. Wide to long transformation
Let's see the following DataFrame. We can see that the names of some columns are similar. There are two columns that start with age, and two with weight. Those columns are the same variable but for different years.
3. Wide to long transformation
If we would like to transform it to a long DataFrame like the one we see in the slide, we cannot do it with melt. We need another function, the wide to long function. Notice that this is a pandas function, and not a DataFrame method.
4. Wide to long function
This function takes several arguments. The first one is the DataFrame we want to transform.
5. Wide to long function
The next one is the stubnames argument. With it, we can specify the prefix, which is how the names of the wide columns start. In our example, we know our columns start with age and weight.
6. Wide to long function
The j argument tells pandas how we want to name the column that contains the suffix or the end of the wide columns. In our case, we will call it year.
7. Wide to long function
Finally, the i argument takes the column or list of columns we will use as unique identifiers. In our case, it's the name. Notice that this column will be the index of the long DataFrame.
8. Reshaping data
Let's see an example. We have the following dataset.
9. Reshaping data
We will apply wide to long function, passing in the books DataFrame
10. Reshaping data
telling pandas our columns have the prefixes ratings and sold,
11. Reshaping data
and that we want to call the new column with the suffix year
12. Reshaping data
and that the title column should be the unique index. We can see in the output our new long DataFrame. Now, title and year are indexes, while the columns rating and sold contains the values for each year.
13. DataFrame with index
It is important to mention that if we have a DataFrame with a named index as you see in the example, and we apply the wide to long function,the resulting DataFrame will not keep the original index.
14. DataFrame with index
If want to keep it, we modify the original DataFrame by resetting the index without dropping it. And then apply the transformation including the new column. As we can see in the output, the title is now part of the long DataFrame.
15. sep argument
This new DataFrame is very similar to the previous one, but the name of the columns contains an underscore between the prefix, ratings or sold, and the suffix, the year.
16. sep argument
If we apply the transformation as before, we'll get an empty DataFrame. This happens because pandas doesn't recognize the name of the columns. It always assumes that the prefix is immediately followed by a numeric suffix.
17. sep argument
To overcome this, we can use the sep argument. We specify that the separator element is an underscore. Now, pandas understands that the prefix ratings or sold is separated by an underscore from the year, and returns the correct DataFrame.
18. suffix argument
Finally, if the names of the wide columns do not end in a number, like in the DataFrame you see in the slide,
19. suffix argument
and we apply the same transformation as before, we'll get an empty DataFrame since pandas assumes the suffixes are numeric.
20. suffix argument
To solve this, we use the suffix argument. We pass the following expression: backslash w plus. This expression indicates to pandas that the name of the column ends in a word. Now, pandas recognizes the names of the columns and the correct DataFrame is returned.
21. Let's practice!
You have learned about using the wide to long function. It's time to practice!