Working with semi-structured data in Snowflake
1. Working with semi-structured data in Snowflake
Welcome back! In this final video of this chapter, we'll discuss one of the unique features of Snowflake's functionality - working with semi-structured data.2. Semi-structured data in Snowflake
In addition to being column-oriented, Snowflake also supports working with semi-structured JSON data. As we saw before, data in JSON format looks similar to a Python dictionary. It contains keys and values and can support nesting, as we see with the size key and its nested object, another JSON object. There are a number of tools and technologies that create JSON data, the main one being APIs. Often, JSON data needs to be stored in its raw format before it is transformed; this is where Snowflake comes in. Using the VARIANT type, JSON data can be stored in a single column of a Snowflake table.3. Semi-structured data types in Snowflake
In addition to the VARIANT type, Snowflake also supports the OBJECT and ARRAY types. You can think of the OBJECT type as quite similar to Python dictionaries, and the ARRAY type as similar to Python lists. In this video, we'll focus entirely on the VARIANT type, which can contain data in both list-like and dictionary-like format. As mentioned earlier, VARIANT allows for semi-structured data to be stored in a single column, as shown on the right. Here, the column library takes the type VARIANT and stores two OBJECTs. Snowflake offers two ways to query data from a column of type VARIANT: bracket notation and dot notation. First, let's take a look at bracket notation!4. Querying semi-structured data with bracket notation
To query the unstructured data in the books table, we can use bracket notation. To query the "ISBN_13" field shown on the left, we can SELECT the "ISBN_13" field of the library column from the books table using brackets. The result returned below has two rows, one for the value stored at the "ISBN_13" key in each OBJECT in the library column. This technique may be intuitive for Python users, as it's similar to extracting values from a Python dictionary using bracket notation.5. Querying semi-structured data with dot notation
In addition to bracket notation, Snowflake also offers dot-notation to extract data from a column of type VARIANT. Here, we SELECT the "ISBN_13" and "publisher" fields from the library column of the books table. We use colons between the column name and the top-level field to denote the data that we'd like to return. If querying a nested a field, we use a "dot" between the parent and child fields. The result is a table with two rows and two columns containing the "ISBN_13" and "publisher" data.6. Querying nested semi-structured data
Sometimes, data is nested in a VARIANT field. When this is the case, we can use either bracket or dot notation to extract the nested data. In this example, we'd like to pull information about the size of the two books. To do this with the dot notation, we use the syntax column_name, colon, field_name, dot, field_name" to extract the nested field. This is done once for the dimensions data point and once for the weight. We can do something similar with bracket notation by adding an additional set of brackets and a field name to retrieve the nested information. Both these queries, while different, return an identical table with three columns and two rows containing the "ISBN_13" field, as well as the dimension and weight of each book.7. Let's practice!
Awesome work! Time to practice query semi-structured data in Snowflake with a few exercises. Good luck!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.