1. Working with nested JSONs
In the previous lessons, we focused on getting data from JSON files and APIs. Now, we'll turn our attention to restructuring nested JSON data.
2. Nested JSONs
Recall that JSONs contain objects with attribute-value pairs.
A JSON is nested when a value is itself an object.
3. Nested JSONs
The Yelp data has several examples of nesting.
4. Nested JSONs
The coordinates and location attributes have objects for values.
5. Nested JSONs
Categories' values are lists of objects.
6. Nested JSONs
And these records are all nested under businesses.
7. Nested JSONs
Checking these columns in our dataframe, we see dictionaries. The data would be easier to work with if nested attributes were in their own columns.
Fortunately, pandas has a function to flatten nested JSONs. Unfortunately, it's in a submodule that doesn't normally load when importing pandas.
8. pandas.io.json
The pandas dot io dot JSON submodule contains functions for reading and writing JSON beyond those in the primary pandas package,
but it has to be imported explicitly.
We'll use its JSON normalize function to flatten nested data.
JSON normalize takes a dictionary or list of dictionaries. This means that JSONs that load with pd DataFrame will load with JSON normalize.
It returns a dataframe.
Nested attribute column names follow the default pattern attribute dot nested attribute.
The dot separator interferes with pandas' dot notation for column selection, so it's advisable to specify a different separator, like underscore, with the sep argument.
9. Loading Nested JSON Data
Let's use JSON normalize to flatten nested data as we load it.
We import pandas and Requests,
plus JSON normalize from pandas dot io dot json.
We create a variable for the API endpoint, as well as header and parameter dictionaries.
Then we pass those to requests get, and isolate the data with the response's JSON method.
10. Loading Nested JSON Data
We pass the data nested under businesses to JSON normalize, specifying underscores as the separator, and assign the resulting dataframe to bookstores.
Printing the dataframe's column names shows that nested attributes have their own columns, like coordinates underscore latitude.
11. Deeply Nested Data
Categories, however, is still nested.
12. Deeply Nested Data
There are a few approaches for deeply nested data. You may write a custom flattening function, or decide it's irrelevant to the analysis and leave it be.
Another option is to use JSON normalize's record path, meta, and meta prefix arguments.
Record path takes a string or list of strings of attributes to the nested data, like listing folders in a file path.
Meta takes a list of higher-level attributes to include in the resulting dataframe. Nested data can be flattened by passing their record paths as sub-lists.
To make clear what came from where and avoid duplicate column names, specify a meta prefix.
13. Deeply Nested Data
Let's see this in action with categories.
We pass the businesses data to JSON normalize, and specify the separator.
We set record path to categories.
We get business names, aliases, ratings and coordinates with the meta argument. To flatten coordinates, we supply sub-lists containing paths to latitude and longitude.
Businesses and categories both have alias columns, so we set a meta prefix to differentiate them.
14. Deeply Nested Data
Viewing the data, we see it's totally flat, but businesses with multiple categories repeat. Depending on the use case, this may be fine, or duplicates may need to be removed.
15. Let's practice!
As you can tell, working with JSON can get quite deep. The best way to get comfortable in those depths is practice.