Intermediate Aggregate Functions

1. Intermediate Aggregate Functions

In this video we’re going to go a step further than the standard aggregation functions and focus on Snowflake’s grouping extensions: GROUPING SETS, ROLLUP, and CUBE.

2. Order of Operations

SQL doesn't run in the order you write it. It starts by building the dataset: FROM, JOIN, then WHERE to filter rows. Then it groups and aggregates with GROUP BY and HAVING. Finally it shapes the output with SELECT, ORDER BY, and LIMIT. The key thing to remember: WHERE runs before GROUP BY, so if you want to filter on an aggregation, you need HAVING, not WHERE.

3. Common Aggregate Functions

You probably recognize these common aggregate functions: COUNT, SUM, AVG, MIN, and MAX. One worth highlighting is COUNT with DISTINCT. COUNT star counts every row; COUNT DISTINCT counts only the unique values in a column. Here, that gives Harbr both the total number of shipments and the number of distinct carriers involved - two different questions answered in one query. The challenge with aggregate functions isn't computing a single aggregation - it's computing several levels at once without duplicating your query.

4. The Problem with Multiple Queries

Let’s walk through the example. To get warehouse totals, region sub_totals and grand totals you have to write three different queries. You could UNION ALL them together, but now the same logic lives in three places, and any schema change means updating all three. GROUPING SETS solves this in a single query.

5. GROUPING SETS

GROUPING SETS lets you define exactly which summaries you want to calculate. The first set is region and warehouse_name which gives you the detailed breakdown. The second which is a region on its own which gives you the regional subtotals, and the empty set gives you the grand total. Now if we look at the output, whenever you see NULL it doesn’t mean missing data but that the column has been aggregated. So EMEA with a NULL warehouse is a regional total, and when both columns are NULL that’s your grand total. It's important to note if we want to add more meaning outside of NULL, we would need to implement a CASE statement.

6. ROLLUP

Next we’ll cover ROLLUP which is a shorthand for a pattern of GROUPING SETS. With two columns, ROLLUP produces three grouping levels automatically with both columns together, the first column alone, and the grand total. The output looks identical to the GROUPING SETS example. ROLLUP is just a more concise way to express a top-down hierarchy. Add a third dimension and ROLLUP keeps rolling up through every level. It's the right choice when your grouping columns have a natural parent-child relationship and want to analyze hierarchical data.

7. CUBE

CUBE goes further than ROLLUP. It computes every possible combination of the specified dimensions, including ones that ROLLUP skips. Notice the rows where region is NULL but warehouse_name has a value. Those are warehouse-level totals across all regions. ROLLUP doesn't produce those as it only moves up a fixed hierarchy whereas CUBE produces all combinations. With two dimensions that's four grouping sets; with three dimensions it's eight. You should use CUBE when your reporting needs subtotals in every direction, where you analyze data across independent axes.

8. GROUPING() and GROUPING_ID()

NULL in grouped results is ambiguous — is it a missing value, or a subtotal? GROUPING clears that up. It returns 1 when a column has been aggregated away, and 0 when it's a real value. GROUPING_ID is an alias of GROUPING — they are essentially the same function. But the way you use them differs. When you pass GROUPING a single column, you get one flag back. GROUPING_ID becomes useful when you pass in multiple columns as it combines those individual GROUPING flags into a single number, giving each aggregation level a unique identifier. In the output, 0 means full detail, 1 means warehouse is aggregated, and 3 means both region and warehouse are aggregated away so that's your grand total row. This makes it easy for downstream tools to tell summary rows apart from detail rows, without having to parse NULLs and guess at whether they're meaningful or structural.

9. Let's practice!

Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.