1. Introduction to JSON
Until now, you've built pipelines to data in tabular formats. In this chapter, we'll shift our focus to data in Javascript Object Notation, or JSON, data.
2. Javascript Object Notation (JSON)
As the name implies, JSON is based on Javascript, but popular programming languages have versions of the data structures JSONs use, making them easy for programs to generate and parse while still being human-readable. And because of Javascript's role in web development, JSON is a common format for transmitting data through the web.
Unlike dataframes, JSON data is not tabular. This makes for more efficient data storage -- if a value doesn't exist for a record, the attribute can be omitted instead of storing it with a null value.
In other words, records don't all have to have the same set of attributes.
Instead, data is organized into collections of objects.
Objects resemble Python dictionaries: they're enclosed in curly braces and contain attribute-value pairs.
One last feature of JSONs is that they can be nested -- values themselves can be objects, or lists of objects.
3. Reading JSON Data
You can guess the pandas function to load a JSON into a dataframe -- it's read JSON.
Read JSON takes a string of the path to the JSON to load. This path can be to a file saved on a computer or a URL ending in dot JSON. Alternatively, you can supply JSON data directly as a string.
As with flat files and spreadsheets, pandas guesses attribute data types, but you can specify them with a dictionary of column names and values and the dtype argument.
JSON data can be laid out in various ways, so there is an orient keyword argument to flag uncommon layouts.
4. Data Orientation
Since JSON data isn't tabular, pandas makes assumptions about how it's arranged, or oriented, to load it into a dataframe.
pandas automatically detects record and column orientations, which you'll encounter most often. Let's see what they look like.
5. Record Orientation
A record-oriented JSON consists of a list of dictionaries, each translating to a table record. For example, this JSON of causes of death from New York City's open data site is record oriented.
6. Column Orientation
To reduce file size by not repeating attribute names, a JSON may be column-oriented.
The entire JSON is a dictionary. Keys are column names. Values are lists of values for that column, or dictionaries of row indices and column values, like in this rearranged version of the death causes data.
7. Specifying Orientation
However, other orientations are possible. This version of the death data is split oriented, with different lists for column names, indices, and values. Let's load it to a dataframe.
8. Specifying Orientation
We import pandas,
then load the data with read JSON, specifying orientation by passing one of a set of documented string values to the orient keyword argument.
When we print the data, it looks like it loaded correctly.
9. Let's practice!
And that's a quick introduction to JSON. We'll dive deeper in later lessons, but now, it's your turn to practice working with JSON data!