Semistructured data
1. Semistructured data
In today's world, semistructured data such as JSON is everywhere. Let's explore how to query and extract values from it.2. Semistructured data
In Redshift, we handle semistructured data using the SUPER type and have dedicated SQL functions working with different formats. Let's delve deeper by focusing on the JSON objects, which are key-value pairs enclosed in curly brackets, and their functions.3. JSON Validation Functions
We have tools to ensure our JSON data is in a valid format. Functions like IS-VALID-JSON validate entire JSON objects, helping maintain data. It checks whether a field or a JSON string is valid and readable, returning a boolean. Notice if we are missing a closing bracket, it will tell us it's invalid. IS-VALID-JSON-ARRAY works similarly but looks for an array of values. Notice it errored when the first one was a scalar value, not an array.4. Extracting from a JSON object
Extracting data from a JSON object is a breeze with functions like JSON-EXTRACT-PATH-TEXT. It swiftly retrieves values based on specified keys, simplifying the extraction process. For example, if we have a JSON object with two keys and I want to get the value for the key of one, we pass the key name as a parameter after the field or JSON string.5. Attempting to parse invalid JSON
If the JSON is invalid, we'll get an error, as shown here. That's why our JSON validation functions are so valuable to ensuring smooth operations and reliable workflows.6. Extracting from nested JSON paths
A JSON object that has one or or more values that are also JSON object is a nested object. We can pinpoint and extract data from intricate JSON structures by supplying all the keys in the path. In this example, we want the value of the nested_three key nested in the one_object key's value. We pass all the keys required to get there, so in this case, it's keys one_object and nested_three. This query returns the value of three.7. Extracting from nonexistent nested JSON paths
If the JSON is valid, but the key we are trying to extract doesn't exist, it will return NULL. For example, we are attempting to get the nested_five key from an object in the two key's value; however, the two key's value is just the number 2, so there can not be a nested_five key. So, the query returns NULL.8. Extracting from a JSON array
Navigating JSON arrays is handled with JSON-EXTRACT-ARRAY-ELEMENT-TEXT, which takes a JSON field or string and then an integer of the array index we want to retrieve; these indexes start at zero. In this example, we're extracting the final key from the array, index position 2. This query will give us the value 13. If we supply an index not present in the array, we will get a NULL, as shown here.9. Extracted from a nested JSON array
Combining extraction functions, we can traverse nested JSON arrays. As shown here, we are using JSON-EXTRACT-PATH-TEXT to get the array in nested_two key's value. Then, the enclosing JSON-EXTRACT-ARRAY-ELEMENT-TEXT function is used to get index one from the nested array, which is the number 4. This is a lot of typing, though.10. Extracting from a nested JSON array shortcut
Luckily, we can use the JSON-EXTRACT-PATH-TEXT function with multiple path strings. Here, we're performing the same extraction as the prior slide but just passing both nested_two and the array index to the function. The array index must be a string.11. Casting in CTEs
When we are copying and pasting JSON to work on, one trick is to make a CTE and to convert it to SUPER then to VARCHAR so that it parses like a JSON string stored in a table. In this example, we're doing that with the location_details and aliasing it AS data, so it's accessible as location_details.data.12. Let's practice!
Now, it's your turn to wrangle this JSON into submission.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.