Get startedGet started for free

Flattening Semi-Structured Data

1. Flattening Semi-Structured Data

It's time to do something a little different! Let's see what we can do with semi-structured data in Snowflake!

2. Structured data

By now, we've seen data organized in rows and columns, like this. However, Snowflake can also handle semi-structured data, which looks a bit different. Believe it or not, the record for school s_19219 contains the same information in both outputs.

3. Semi-structured data

In Snowflake, data stored using braces and key-value pairs takes the data type `VARIANT`. This data might look like a Python dictionary or a JSON object. Semi-structured data allows data to be stored in a more "raw" format, with nested objects like `address`. There are two different ways to retrieve this data. First up, dot-notation!

4. Dot-notation

Dot-notation makes it easy to retrieve top-level and nested values from variant data. To clarify, `my_first_key` is top-level data, while `a` is nested. The syntax is quite basic; in a `SELECT` statement, we'll start with the name of the column. After the column is a colon, followed by the top-level key. If we'd like to retrieve nested data, you'll follow the top-level key with a dot and the nested-key to retrieve. We can continue to use this syntax to retrieve deeply-nested values.

5. Dot-notation

We're using dot-notation to retrieve the `school_name` from the `address_info` column. This is done using the syntax `address_info-colon-school_name`. To retrieve nested fields such as `street_number`, we'll use the syntax `address_info-colon-address-dot-street_number`. We can use this same syntax to find the street name and suffix.

6. Bracket-notation

We can also retrieve top-level and nested values from `VARIANT` data with bracket-notation. To the extract `my_first_key` value-pair, we'll use my_column-bracket-quote-my_first-key-quote-bracket. To retrieve a nested key-value pair, we'll add another bracket-quote-key-quote-bracket on the end of the existing statement. This makes it easy to retrieve data from many nested layers. For Python developers, this syntax might feel a bit like retrieving data from a Python dictionary. Pro tip; when using bracket-notation, make sure to use single-quotes!

7. Bracket-notation

Using bracket-notation, we can now retrieve key-value pairs from the `address_info` column. Passing `school_name` in quotes and brackets after `address_info` pulls the school name from the semi-structured data. To parse the nested `city` key-value pair from `address_info`, all we need to do is `city` in quotes and brackets after address. The same applies for `zip_code`.

8. Transforming semi-structured data

Let's put it all together. Using both dot and bracket-notation, we can parse useful information from semi-structured data. If you look closely, you'll see that we're retrieving both top-level and nested key-value pairs from the `address_info` column.

9. Transforming semi-structured data

We've made it full-circle; from this semi-structured data shown above, we're successfully manipulated it into the tabular output here. This is quite a useful skill when working with data in Snowflake!

10. Let's practice!

Now, it's your turn to put some more structure into semi-structured data!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.