Manipulating document data
Throughout this chapter, you've explored a number of tools to work with semi-structured document data in Postgres. In this final exercise, you'll put all of these tools to work to create an analytics-ready dataset. You'll be working with the nested_reviews
table, which takes the form shown below.
To help get you started, pandas
has been imported as pd
, and a connection object has been created and stored in the variable db_engine
. Best of luck!
This exercise is part of the course
Introduction to NoSQL
Exercise instructions
- Using the
#>
operator to return the nestedbranch
field from thelocation
object in thereview
column, as JSON. Alias asbranch
. - Query the
statement
field in thereview
column, using the->>
operator, aliasing the result asstatement
. - Filter results to only include records with a
reviewer
location of'Australia'
, with the help of thejson_extract_path_text
function.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Extract fields from JSON, and filter by reviewer location
query = """
SELECT
review_id,
____ #> '{____, ____}' AS ____,
____ ->> '____' AS ____,
rating
FROM nested_reviews
WHERE ____(____, '____', '____') = 'Australia'
ORDER BY rating DESC;
"""
data = pd.read_sql(query, db_engine)
print(data)