1. Dealing with nested data columns
In the previous video, we saw
2. Review
how to read a nested JSON into a DataFrame using the json_normalize function. Now, we'll examine how to deal with nested data inside a DataFrame column.
3. Nested data in columns
Imagine we have a list named writers, and we define a list of JSON strings named books. Now, we can convert them into a DataFrame. Inside the DataFrame call,
4. Nested data in columns
we'll call the dictionary constructor and
5. Nested data in columns
set as the writers column the writers list and as the books column the list of dictionaries named books. We now have a DataFrame with two columns. The books columns contains the nested data.
6. Converting nested data
In order to handle that, let's import the json module. Now, we will
load the JSON string contained in books into a dictionary.
7. Converting nested data
To that aim, let's use the apply() method. This method will help us apply any function to every single value of the column "books".
8. Converting nested data
What function? the json dot loads function. It will convert a JSON string into a Python dictionary.
9. Converting nested data
Then we will convert the result into a Series object. To do that, we use again the apply() method passing the Series constructor from pandas as an argument.
This operation will flatten out the dictionary into a DataFrame. Pay attention to how it creates one column per key.
10. Concatenate back
We can then drop the original nested column. After that, we concatenate it with the newly generated DataFrame. As a result, we get a DataFrame with three columns without any nested data.
11. Dumping nested data
There is another approach we can take.
12. Dumping nested data
After loading the JSON string contained in books, we can use the to_list method to transform it into a list. We can use the json.dumps function to transform the json object into a string.
And finally, we use the read_json function to read this string into a DataFrame. Our new DataFrame has two columns that correspond to the original strings.
13. Dumping nested data
We can use now this new DataFrame to concat with the writers column of the original DataFrame, and we get the same DataFrame as before.
14. Let's practice!
Now, you know how to handle nested data in DataFrame columns. It's time to practice!