Non-tabular NoSQL data stores

1. Non-tabular NoSQL data stores

Now that we've taken a look at tabular NoSQL data stores, we'll explore their counterparts; non-tabular NoSQL data storage tools.

2. Document databases

Document databases store data in a flexible, semi-structured format, typically made up of key-value, key-array, and key-object pairs. Here, there's a single document, containing two key-value pairs, as well as a key-array, and key-object pair. Since document databases don't have a rigid schema, they're useful for capturing data that may not have otherwise been captured using a traditional, RDBMS. Here, the "topics" key-array pair can hold an arbitrary number of topics, and the "author" key-object pair can contain detailed information, specific to each author. This functionality makes document databases an especially popular source system when building data pipelines. We'll use Postgres JSON to get hands-on experience working with data stored in document format.

3. Querying JSON data with Postgres JSON

Document data can be stored in a column of a Postgres table using the JSON data type. In the "data science resources" table on the left, two documents are stored in the "books" column. But how can we query these documents? We'll need to extend the functionality of traditional SQL to do this. To return the "title" and "price" for all books by a certain "author", we can use the arrow, and the double-arrow operator. Here, the arrow operator is used to retrieve the "title" and "price" values from each document. In the "WHERE" clause, we leverage both the arrow and double-arrow operator to filter by the author's name. We'll explore the difference between the arrow and double-arrow operator more in Chapter 3. When this query is executed, it returns a tabular result that looks the table shown below. Later, we'll take a closer look at a number of other tools Postgres JSON offers to work with document data.

4. Connecting to a Postgres database

To connect to a Postgres database, we'll use the sqlalchemy library. A connection string is formed as shown above, and passed to sqlalchemy's create-engine function. The results is a connection object that can be used to connect to a Postgres database using tools such as pandas. A connection object will be created for you in each of the exercises that use Postgres JSON, and stored in the db-engine variable.

5. Writing and executing Postgres JSON queries

To make interacting with document data easier, we'll use pandas to query Postgres. After pandas is imported as pd, a Postgres JSON query can be written and stored as a string. Then, the query and previously-defined db-engine can be passed to the pd-dot-read-sql function. The resulting DataFrame is stored to a variable, and can be printed to show the results.

6. Other non-tabular NoSQL data stores

In addition to working with data stored in document format, we'll explore the intricate details of working with key-value data, using Redis. While both Postgres JSON and Redis store data in a similar format, Redis is optimized for fast, in-memory data manipulation and caching. We'll also touch on graph databases at a high-level, and discuss their common use-cases and implementations.

7. Let's practice!

Great work - time to get your hands dirty using Postgres JSON with a few exercises!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.