Get startedGet started for free

Special aggregations in BigQuery

1. Special aggregations in BigQuery

BigQuery provides additional aggregate functions that increase performance. While standard aggregate functions like COUNT, SUM, and AVG are widely used, these specialized aggregate functions cater to specific analytical needs.

2. Introduction to special aggregates

Special aggregate functions in BigQuery are designed to handle large datasets efficiently and provide approximations for certain calculations. These functions offer advantages such as reduced processing time and resource consumption, making them ideal for analyzing massive datasets. This table shows the special aggregates we will cover and their categories: approximate aggregation, string/array manipulation, and logical operations.

3. ARRAY_CONCAT_AGG

The ARRAY_CONCAT_AGG function is designed to address the limitations of the standard ARRAY_AGG function, which only accepts a single array as input. In this example, we aggregate 'order-items' arrays for each order item ('order-items'), merging their skill sets into a single 'all-items' array. ARRAY_CONCAT_AGG simplifies array aggregation, allowing us to merge arrays within groups facilitating streamlined handling of array data in BigQuery.

4. STRING_AGG

The STRING_AGG function in Google BigQuery is a versatile tool for concatenating strings into a single string. It takes a column of strings as input and returns a single concatenated string, optionally separated by a specified delimiter. This function is particularly useful when working with text data and allows us to effectively combine and summarize textual information. In this query, we are returning one column of data of all customer IDs separated by a comma with estimated delivery dates within the next three days.

5. APPROX_COUNT_DISTINCT

"approx-count-distinct" offers a quick estimation of the count of distinct values, particularly beneficial when dealing with large datasets, providing faster results while maintaining reasonable accuracy in BigQuery. In this query, we are counting the distinct number of order_ids as unique orders and grouping the results by 'customer-id'.

6. APPROX_QUANTILES

The APPROX_QUANTILES is used when estimating quantiles in a large dataset. It provides approximate quantile values for a numerical column, offering performance advantages over the traditional percentile functions. Here we are finding the approximate quantiles with 4 bins as defined in the second argument. We are finding the quantiles for each of the different product categories.

7. APPROX_TOP_COUNT

The APPROX_TOP_COUNT function in Google BigQuery identifies the top K elements in a large dataset based on their occurrence within the query parameters. Using probabilistic techniques, APPROX_TOP_COUNT provides an approximate and efficient way to identify the top K elements. In this query, we find the top 3 customers by product category, where we can identify customers with common order patterns.

8. APPROX_TOP_SUM

The APPROX_TOP_SUM function finds the top K elements in a large dataset and then aggregates the sum of those data points. It also uses probabilistic techniques similar to APPROX_TOP_COUNT. As you can see in this example, APPROX_TOP_SUM takes three arguments. The first item is the category we want to group by, in this case, the "item-id". This is followed by the price, which the SUM will be calculated on, and then the number 3, which will return the top 3 results. This query is grouped by the "seller-id" which will show us the top three items from each seller and the total sales.

9. LOGICAL_AND and LOGICAL_OR

The final approximate aggregate functions we will cover are LOGICAL_AND and LOGICAL_OR. In an aggregate query on a boolean column, LOGICAL_AND will return "true" only if all conditions are true, and LOGICAL_OR will return "true" if all conditions are true. In this query, we use LOGICAL_AND to see if all orders for that customer have been shipped and LOGICAL_OR to see if at least one has shipped.

10. Let's practice!

With that, let's put our new knowledge to the test!