Get startedGet started for free

Transforming a list-like column

1. Transforming a list-like column

We have learned several methods to reshape DataFrames. In most of the cases, our columns contained single values.

2. List-like columns

But we could have a column that contains values in a list, such as the zip code column you see in the slide. This type of column is called a list-like column. As you can imagine, it's hard to work with them in this format.

3. Transforming list-like columns

The best approach is to transform each list-like column into a separate row.

4. The .explode() method

Pandas provides us with the explode() method that can help us with this operation.

5. Exploding a column

Imagine we have the following DataFrame. Again, the zip code is a list-like column.

6. Exploding a column

We can explode the values of zip code to a separate row. For that, we select the column zip code from the DataFrame and call the explode() method. As a result, we get a pandas Series with each value in a different row. As you can see, this replicated the index values from the original row.

7. Exploding a column

Now, let's get this information back in the original DataFrame. First, we select the rest of the columns: city and country.

8. Exploding a column

We then apply the merge() method and pass the exploded series. This method will join both data structures together. We need to specify how to join them.

9. Exploding a column

Because the index is replicated, we can use this to track the original row. We set the parameters left_index and right_index to True. It will tell pandas to join the rows with the same index. As a result, we get the original DataFrame but with each value of zip_code in a separate row.

10. Exploding a column in the DataFrame

There is a faster way to do it. We can explode the zip code column in the cities DataFrame. For that, we apply the explode method on the whole DataFrame. Then, we pass the name of the column we want to explode. As a result, we get a DataFrame but with each value of zip_code in a separate row. Again, this operation replicated the index values.

11. Exploding a column in the DataFrame

We can use the reset_index method to modify the DataFrame without keeping the original index. And now, the DataFrame has unique indexes.

12. Empty lists

In the following DataFrame, we can notice an empty list in the second row. The explode method will replace the empty list with a NaN value. We have then to be careful with newly generated missing values.

13. Chaining operations

Here is the cities DataFrame again, but this time, the zip code column does not contain lists. It has comma-separated strings. How can we expand this column?

14. Chaining operations

We have seen how to split strings using the split() method of the str module. But this leads to separated columns. We want separated rows. Let's see how we can achieve that.

15. Chaining operations

We will call the assign() method on the cites DataFrame. This method allows us to assign values to columns. We specify we want to set the zip code column

16. Chaining operations

to the values after splitting the zip code column.

17. Chaining operations

Then, we chain this operation to the explode() method. As a result, we get the zip code values in separated rows, as we wanted.

18. Let's practice!

Now, you are ready to explode list-like columns!