1. Aggregating Data
To start to get an understanding of a new dataset, it's common to count the number of observations and summarise any numeric columns. In this lesson, we will learn how to perform this using T-SQL. The examples use the grid database.
2. SUM - single column
It's straightforward to obtain the sum of a single column.
Starting with SELECT, type SUM, then open parentheses, then type the column name, then close the parentheses. You should also provide a meaningful alias, otherwise the results will be returned as (No column name), which will get confusing very quickly
3. SUM - two or more columns
Here is our original SUM query. We want to sum another column. Any ideas on how we should do this? We need to add an additional SUM statement, wrapping our extra column in parentheses, and aliased as before. You need to repeat this for each additional column you want to sum.
4. The wrong way...
If you don't prefix the second column with SUM, you get an error message. What this means, is that when you are using aggregate functions like SUM, every column you SELECT either needs to be used in an aggregate function, or it needs to be used as a grouping column, which we will cover later.
5. Use aliases
So what's wrong here?
That's right, I forgot to provide an ALIAS for my column names, and so I get these less than helpful defaults.
You probably aren't going to remember what these columns are, so don't forget to use AS to provide a meaningful name for your results.
6. COUNT
The simplest COUNT query will return a count of the number of rows in that particular column.
7. COUNT Distinct
By itself, a simple count may not be that useful. However, using DISTINCT to obtain a count of unique values, or adding a WHERE clause, can make the results much more meaningful.
8. MIN
To return the minimum value from a column, we use the MIN function, again with parenthesis and providing a column alias. You may need to amend this query, in case there are data quality issues. For example, if there are many zero values - you may want to exclude them from the results.
9. MAX
We can use the MAX function to obtain the maximum column value.
10. Average
Finally, you can use the AVG function, to return the average value of a column.
11. Let's practice!
OK, let's see how much you remember! Let's sum, count and aggregate.