Working with nested JSONs

1. Working with nested JSONs

In the previous lessons, we focused on getting data from JSON files and APIs. Now, we'll turn our attention to restructuring nested JSON data.

2. Nested JSONs

Recall that JSONs contain objects with attribute-value pairs. A JSON is nested when a value is itself an object.

3. Nested JSONs

The Yelp data has several examples of nesting.

4. Nested JSONs

The coordinates and location attributes have objects for values.

5. Nested JSONs

Categories' values are lists of objects.

6. Nested JSONs

And these records are all nested under businesses.

7. Nested JSONs

Checking these columns in our dataframe, we see dictionaries. The data would be easier to work with if nested attributes were in their own columns. Fortunately, pandas has a function to flatten nested JSONs. Unfortunately, it's in a submodule that doesn't normally load when importing pandas.

8. pandas.io.json

The pandas dot io dot JSON submodule contains functions for reading and writing JSON beyond those in the primary pandas package, but it has to be imported explicitly. We'll use its JSON normalize function to flatten nested data. JSON normalize takes a dictionary or list of dictionaries. This means that JSONs that load with pd DataFrame will load with JSON normalize. It returns a dataframe. Nested attribute column names follow the default pattern attribute dot nested attribute. The dot separator interferes with pandas' dot notation for column selection, so it's advisable to specify a different separator, like underscore, with the sep argument.

9. Loading Nested JSON Data

Let's use JSON normalize to flatten nested data as we load it. We import pandas and Requests, plus JSON normalize from pandas dot io dot json. We create a variable for the API endpoint, as well as header and parameter dictionaries. Then we pass those to requests get, and isolate the data with the response's JSON method.

10. Loading Nested JSON Data

We pass the data nested under businesses to JSON normalize, specifying underscores as the separator, and assign the resulting dataframe to bookstores. Printing the dataframe's column names shows that nested attributes have their own columns, like coordinates underscore latitude.

11. Deeply Nested Data

Categories, however, is still nested.

12. Deeply Nested Data

There are a few approaches for deeply nested data. You may write a custom flattening function, or decide it's irrelevant to the analysis and leave it be. Another option is to use JSON normalize's record path, meta, and meta prefix arguments. Record path takes a string or list of strings of attributes to the nested data, like listing folders in a file path. Meta takes a list of higher-level attributes to include in the resulting dataframe. Nested data can be flattened by passing their record paths as sub-lists. To make clear what came from where and avoid duplicate column names, specify a meta prefix.

13. Deeply Nested Data

Let's see this in action with categories. We pass the businesses data to JSON normalize, and specify the separator. We set record path to categories. We get business names, aliases, ratings and coordinates with the meta argument. To flatten coordinates, we supply sub-lists containing paths to latitude and longitude. Businesses and categories both have alias columns, so we set a meta prefix to differentiate them.

14. Deeply Nested Data

Viewing the data, we see it's totally flat, but businesses with multiple categories repeat. Depending on the use case, this may be fine, or duplicates may need to be removed.

15. Let's practice!

As you can tell, working with JSON can get quite deep. The best way to get comfortable in those depths is practice.