Get startedGet started for free

Reshaping with melt

1. Reshaping with melt

In this lesson, we are going to learn how to reshape a DataFrame from a wide to a long format using the melt function.

2. Wide to long transformation

Imagine we want to perform advanced analytics or plot different variables in the same graph. These tasks will require the data to be in a long format.

3. Wide to long transformation

But most data is stored in a wide format. So how do we reshape it? Pandas provides us with a very flexible function, the melt function.

4. Melt

When using melt on a DataFrame, the first argument to set is id_vars.

5. Melt

This argument takes the names of the columns to use as identifier variables. In our example, the values we want to use as identifiers are the columns "first" and "last". These two columns appear in the long format and will help us match all the records for the same observation.

6. Melt

The rest of the columns are melted. Now, in the column named "variable", we have the name of each variable, one row for each observation,

7. Melt

and their corresponding values are now located in the column named "value".

8. Melting data

Let's see an example. We have the following DataFrame containing different features about books. We apply melt, setting the id_vars argument to the column "title". We can see in the output that we have rows for each book title that contains only one feature.

9. Values and variables

What can we do if we do not want to melt all the columns? Luckily, we can use other arguments for that purpose: value_vars, var_name and value_name.

10. Values and variables

The value_vars argument takes the names of the columns we want to melt. This can be only one column or a list of many columns. In our example, we specify age and height as columns to melt. So the long DataFrame contains only those features.

11. Values and variables

The var_name argument takes the name to use for the column "variable". In our case, we name it "feature".

12. Values and variables

Lastly, the argument value_name takes the name to use for the column "value". As you can see, in the slide, we set it to the word amount.

13. Specifying values to melt

Let's use our books DataFrame again. We apply melt again, but this time, we clarify that we only want to melt the columns "language" and "pages". As you can see in the output, we have one row for each book and feature, but now only the language and the pages appears.

14. Naming values and variables

We can go ahead and customize the new DataFrame. So, we specify that the new column "variable" should be called "feature". We also want the column "value" to be named code. The output DataFrame is exactly the same as before, but this time, the name of the new columns has changed.

15. Let's practice!

Now you now how to melt a DataFrame. It's time to put this into practice!