Get startedGet started for free

Using aggregation functions over windows

1. Using aggregation functions over windows

In the last chapter, we looked at different ways to group and reshape data over time. Sometimes, we would like to see the end results of aggregation without changing the grain of our data. That's where window functions come into play.

2. Ranking functions

There are four ranking window functions which SQL Server supports: ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE(). We'll be covering the first three. ROW_NUMBER() provides a guaranteed unique, ascending integer value which starts from 1 and continues through the end of the set. RANK() provides an ascending integer value which starts from 1, but it is not guaranteed to be unique. Instead, any ties in the window will get the same value and then the next value gets its ROW_NUMBER() value. If, for example, the second and third entries are tied, both will get a rank of 2 and the fourth entry will get a rank of 4. DENSE_RANK() behaves like RANK(), except that it does not skip numbers even with ties. If the second and third entries are tied, both will get a dense rank of 2 and the fourth entry will get a dense rank of 3. For the next couple of examples, we will use this simple table of runs scored.

3. Calculating row numbers

All ranking functions require an OVER() clause with an ORDER BY clause inside it. Here, we are ordering our results by the number of runs scored in descending order. What we get back is a unique, ascending integer starting from 1.

4. Calculating ranks and dense ranks

All ranking functions have the same syntax. Here you can see RANK() and DENSE_RANK() together, looking the same as ROW_NUMBER() did. The difference is in the results: whereas ROW_NUMBER() gave us one unique integer per record, we get four groups of results. The difference between RANK() and DENSE_RANK() is how ties behave.

5. Partitions

In addition to an ORDER BY clause, the OVER() clause in a window function can accept a PARTITION BY clause which splits up the window by some column or set of columns. Now we'll introduce the two teams who played and partition runs scored by team. SQL Server handles this and provides us a unique, ascending row number for each team, so the Arizona team has three entries and the Florida team has three entries. Note that Florida's row numbers reset to 1--we are guaranteed to start at 1 for each window and the `PARTITION BY` clause defines our window.

6. Aggregate functions

In addition to ranking functions, we can also use windows on aggregate functions. As a quick reminder, aggregate functions include functions like AVG(), COUNT(), MAX(), MIN(), and SUM(), as well as several others. Here, we want to see the maximum number of runs each team scored as well as the runs they scored in a particular game. Instead of writing two separate queries, we can get both pieces of information in a single query. This lets us see how close each team is to peak offensive efficiency.

7. Aggregations with empty windows

Going one step further, an aggregate function with an empty OVER() clause does the same thing as the non-windowed aggregation function except for one difference: it does not require that we group by non-aggregated columns. Here is an example where we use the MAX() function with an empty OVER() clause. What we get back is every single row as well as the maximum number of runs scored. This is akin to writing a subquery or common table expression which queries the data twice, once to get the individual rows and a second time to get the maximum number of runs.

8. Let's practice!

With a bit of window function experience under our belts, let's do some exercises.