Basic aggregate functions

1. Basic aggregate functions

We have learned about different ways we can convert between strings and the different date and time data types. In this chapter, we will learn about different functions which help us aggregate time series data.

2. Key aggregation functions

SQL Server has a dozen or so aggregate functions. In this lesson, we will learn about five of these plus an extra variant. There are two counting functions: COUNT() and COUNT_BIG(). Both of these return the total number of rows for a given expression. COUNT() returns an integer and COUNT_BIG() returns a 64-bit integer or BIGINT. You can also get the cardinality of an expression by adding the DISTINCT clause inside COUNT() and specifying a column or expression. Aside from getting counts, there are a few other aggregate functions we will look at. The first is SUM(), which provides the sum of an expression. If you only want the minimum or maximum value for an expression, you can use MIN() and MAX() respectively.

3. What counts with COUNT()

The COUNT() function has some interesting behavior depending upon the expression you put into it. If you use COUNT(*), you will get back the total number of rows in the table. But that also holds true for COUNT(1), saying that COUNT() doesn't care about the expression. Most interesting of these, you can even put in an illegal operation like dividing 1 by 0 and the COUNT() function will happily give you the number of rows. But if you specify a column, now it gives you the count of values where that column is not NULL. Here, we get the count of rows with a non-NULL year. In the prior slide, I mentioned expressions rather than columns. That's because you can do things like this, where I use the NULLIF() function to set the year value to NULL if it is 1990 and then we get the count of non-NULL values.

4. Distinct counts

When using COUNT(DISTINCT), we need to define an expression which includes a column. Then we get the number of unique, non-NULL values which appear in the data set. In this query, we get the distinct count of calendar years as a "Years" column and the distinct number of non-2010 years as Y2. This gives us 50 rows for Years and 49 for Y2.

5. Filtering aggregates with CASE

You can also build expressions with the CASE() operator. These expressions allow you to build in filters to perform aggregates on particular subsets of the data. In this example, we want to get the latest incident date for incident types 1 and 2 and pivot the data so we have one column per incident type. To do this, we combine MAX() with a CASE statement which returns the incident date when we match incident types and returns NULL otherwise. We can see the results of this test, showing that incident type 2's last occurrence was June 29th. The code for this technique becomes a bit unwieldy as the number of pivot operations increases, but as far as performance goes, this is one scan of the incident roll-up table no matter how many incident types we pivot.

6. Let's practice!

Now that you're armed with knowledge on basic aggregate functions, let's knock out a few exercises and learn about some more aggregate functions afterward.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.