Get startedGet started for free

Handling semi-structured data

1. Handling semi-structured data

Let's dive into semi-structured data, primarily focusing on JSON.

2. Structured versus semi-structured

Structured data is highly organized and easily searchable, with a well-defined schema and a fixed structure. By contrast, semi-structured data lacks a definitive structure, but it provides flexibility. If some customers have extra information such as multiple email addresses, we can handle it. This data representation is called JSON.

3. Introducing JSON

JSON, or JavaScript Object Notation, is a commonly used semi-structured data format in Web APIs and configurations. It uses key-value pairs separated by a colon. For instance, here `cust_id` is a key, and 1 is the corresponding value.

4. JSON in Snowflake

Good News! Snowflake natively supports JSON with built-in functionality to efficiently store, process, and query JSON data, making it perfect for evolving schemas. While Postgres uses the `JSONB` data type, Snowflake uses the `VARIANT` data type to handle semi-structured data.

5. How Snowflake stores JSON data

The `VARIANT` data type supports both objects and arrays. An object contains key-value pairs, while an array is an ordered list of values. We can create a Snowflake table with `VARIANT` columns to handle these varied data types, as demonstrated in the customer_info example.

6. Semi-structured data functions

Snowflake's built-in functions help us work with semi-structured data like JSON. One such function is `PARSE_JSON`. This function converts JSON-formatted strings into a `VARIANT` type, returning a valid JSON object, simplifying how we store and query JSON.

7. PARSE_JSON

Let's say we have customer JSON data enclosed in strings. We pass it to `PARSE_JSON`, which returns a JSON object as a `VARIANT` data type.

8. OBJECT_CONSTRUCT

Additionally, `OBJECT_CONSTRUCT` can create JSON objects from key-value pairs. Here, we pass the keys and their respective values as arguments, all separated by commas rather than colons, of our customer information to the function. It returns a JSON object, as shown.

9. Querying JSON data in Snowflake

Now that we know how JSON data is stored, let's query it. Snowflake provides a colon notation to retrieve values from specific keys. For example, to access the age, name and email value from the 'customer_info' column, we use a colon to separate the column name and the key. The result will display separate column values for each key.

10. Querying nested JSON Data in Snowflake

Let's take it one step further and consider nested data. Here, the address shows nested properties like street, city and state. To query these values from the nested "address", we can use either colon or dot notation.

11. Querying nested JSON using colon/dot notations

Colons separate each level of the nested JSON structure. For instance, we can use the given query to access the street name nested within the 'address' key in the 'customer_info' column. Alternatively, dot notation starts with a colon for the first level and uses dots for subsequent levels. Here, we have separated address and street with a dot. The end result is the same.

12. Let's practice!

You're now ready to experiment with semi-structured data in Snowflake. Let's dive in!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.