Flattening Semi-Structured Data
1. Flattening Semi-Structured Data
It's time to do something a little different! Let's see what we can do with semi-structured data in Snowflake!2. Structured data
By now, we've seen data organized in rows and columns, like this. However, Snowflake can also handle semi-structured data, which looks a bit different. Believe it or not, the record for school s_19219 contains the same information in both outputs.3. Semi-structured data
In Snowflake, data stored using braces and key-value pairs takes the data type `VARIANT`. This data might look like a Python dictionary or a JSON object. Semi-structured data allows data to be stored in a more "raw" format, with nested objects like `address`. There are two different ways to retrieve this data. First up, dot-notation!4. Dot-notation
Dot-notation makes it easy to retrieve top-level and nested values from variant data. To clarify, `my_first_key` is top-level data, while `a` is nested. The syntax is quite basic; in a `SELECT` statement, we'll start with the name of the column. After the column is a colon, followed by the top-level key. If we'd like to retrieve nested data, you'll follow the top-level key with a dot and the nested-key to retrieve. We can continue to use this syntax to retrieve deeply-nested values.5. Dot-notation
We're using dot-notation to retrieve the `school_name` from the `address_info` column. This is done using the syntax `address_info-colon-school_name`. To retrieve nested fields such as `street_number`, we'll use the syntax `address_info-colon-address-dot-street_number`. We can use this same syntax to find the street name and suffix.6. Bracket-notation
We can also retrieve top-level and nested values from `VARIANT` data with bracket-notation. To the extract `my_first_key` value-pair, we'll use my_column-bracket-quote-my_first-key-quote-bracket. To retrieve a nested key-value pair, we'll add another bracket-quote-key-quote-bracket on the end of the existing statement. This makes it easy to retrieve data from many nested layers. For Python developers, this syntax might feel a bit like retrieving data from a Python dictionary. Pro tip; when using bracket-notation, make sure to use single-quotes!7. Bracket-notation
Using bracket-notation, we can now retrieve key-value pairs from the `address_info` column. Passing `school_name` in quotes and brackets after `address_info` pulls the school name from the semi-structured data. To parse the nested `city` key-value pair from `address_info`, all we need to do is `city` in quotes and brackets after address. The same applies for `zip_code`.8. Transforming semi-structured data
Let's put it all together. Using both dot and bracket-notation, we can parse useful information from semi-structured data. If you look closely, you'll see that we're retrieving both top-level and nested key-value pairs from the `address_info` column.9. Transforming semi-structured data
We've made it full-circle; from this semi-structured data shown above, we're successfully manipulated it into the tabular output here. This is quite a useful skill when working with data in Snowflake!10. Let's practice!
Now, it's your turn to put some more structure into semi-structured data!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.