Get Started

Dealing with nested data columns

1. Dealing with nested data columns

In the previous video, we saw

2. Review

how to read a nested JSON into a DataFrame using the json_normalize function. Now, we'll examine how to deal with nested data inside a DataFrame column.

3. Nested data in columns

Imagine we have a list named writers, and we define a list of JSON strings named books. Now, we can convert them into a DataFrame. Inside the DataFrame call,

4. Nested data in columns

we'll call the dictionary constructor and

5. Nested data in columns

set as the writers column the writers list and as the books column the list of dictionaries named books. We now have a DataFrame with two columns. The books columns contains the nested data.

6. Converting nested data

In order to handle that, let's import the json module. Now, we will load the JSON string contained in books into a dictionary.

7. Converting nested data

To that aim, let's use the apply() method. This method will help us apply any function to every single value of the column "books".

8. Converting nested data

What function? the json dot loads function. It will convert a JSON string into a Python dictionary.

9. Converting nested data

Then we will convert the result into a Series object. To do that, we use again the apply() method passing the Series constructor from pandas as an argument. This operation will flatten out the dictionary into a DataFrame. Pay attention to how it creates one column per key.

10. Concatenate back

We can then drop the original nested column. After that, we concatenate it with the newly generated DataFrame. As a result, we get a DataFrame with three columns without any nested data.

11. Dumping nested data

There is another approach we can take.

12. Dumping nested data

After loading the JSON string contained in books, we can use the to_list method to transform it into a list. We can use the json.dumps function to transform the json object into a string. And finally, we use the read_json function to read this string into a DataFrame. Our new DataFrame has two columns that correspond to the original strings.

13. Dumping nested data

We can use now this new DataFrame to concat with the writers column of the original DataFrame, and we get the same DataFrame as before.

14. Let's practice!

Now, you know how to handle nested data in DataFrame columns. It's time to practice!