Get startedGet started for free

Unstructured data

1. Unstructured data

BigQuery was originally designed to work with unstructured machine-generated data. This takes the form of two data types in BigQuery, which we will learn more about in this section.

2. Why unstructured data is important

Unstructured data is common in BigQuery as it helps store more complex data structures that pertain to a specific row of data. For example, here we see a data structure that shows a customer and their email addresses.

3. ARRAYs

ARRAYs are similar to list in Python, which means they are ordered lists of zero or more elements. In addition to accessing the values in the array using array-specific functions, we can also access them using the item position and square brackets. As we see on the left, we are constructing an array from a subquery where we combine three values. In the example on the right, we access the second value using the number one between square brackets. Remember that zero will return the first value, so in this case, "analytics" will be the value returned.

4. STRUCTs

STRUCTs are more flexible data structures similar to a dictionary in Python or JSON. A STRUCT can contain a single value, key-value pairs, or even nested data with ARRAYs or STRUCTs inside the STRUCT, but remember that the structure must remain consistent for all values in that column. In the example on the left, we are constructing a STRUCT with one key value pair: a skill name and a boolean. After declaring the STRUCT, we define the key/value names and data types between the angle brackets followed by the data in parentheses. In the example on the right, we can access the key value using the dot notation for the key name, in this case, "skill".

5. ARRAY_LENGTH and ARRAY_CONCAT

Several functions are specific to ARRAYs, but ARRAY_LENGTH and CONCAT help combine and measure arrays. ARRAY_LENGTH allows us to find the number of elements in an array, as seen in the first query. The only argument we need to provide is a valid array. As shown in our second example, CONCAT allows us to concatenate or join two arrays together, which can be helpful when performing aggregations. This takes any number of ARRAY values or columns containing ARRAY data.

6. UNNEST

Another common operation is accessing data in ARRAYs or STRUCTs. BigQuery provides the UNNEST function, which allows us to flatten ARRAYs and turn them into rows within a table. In this example, we are turning an ARRAY with two values into two rows, in this case, the two emails from the prior example.

7. UNNEST with STRUCTs

We will also encounter STRUCTs inside of ARRAYs, which we can use UNNEST to access values inside that data. On the left, we see an example data structure of three STRUCTs inside an ARRAY. Suppose this data is in a column called "skills". To extract the key and value of each STRUCT in the ARRAY to an individual row, we must first extract the data by UNNEST-ing the data following the FROM clause, then use dot notation to access the individual values following the SELECT clause.

8. SEARCH

GoogleSQL also provides a flexible function to search across any data type, including unstructured data called SEARCH. This can be used with any data type but is particularly useful with unstructured data since it allows us to query the entire ARRAY or STRUCT for specific terms. It takes two required arguments, the data to search and the search query, and returns a boolean for each row. Here, we can search to see if any of the emails in our STRUCT are from the gmail.com domain.

9. Unstructured data cheat sheet

To wrap up, here we can see the different unstructured data formats and how to use these formats.

10. Let's practice!

Unstructured data can seem complicated, but let's practice our new skills so you can become a pro!