Grouping and Having
1. Grouping and Having
In this lesson, we'll learn to turbo charge our aggregate queries using GROUP BY, and apply additional filters to our results with HAVING.2. A simple SELECT
Here is a simple query, returning one value, representing the total SUM of the demand_loss column from the grid table. We'll almost always need to dig deeper below a headline figure. If we wanted a breakdown by the individual power outage descriptions - how could we achieve this?3. Grouping error
We might try adapting the previous query by adding the description field to the SELECT statement. But no, we get an error message. The error is telling us that the description field is invalid. It either needs to have an aggregate function applied to it, or it needs to appear in a GROUP BY clause.4. Add A GROUP BY clause
It wouldn't make sense to try and aggregate the description field as it is not a numeric column. However, by using the GROUP BY clause, which splits the data up into groups according to the values within the chosen column, and then applies the aggregation function, the query now works. We return a total sum for each description, including rows where the demand_loss value is NULL.5. Tidying up the GROUP BY query
Here's a slightly modified version of the query. We've filtered out NULL values from the demand_loss column, and restricted the description to fields that end with 'storm'. This gives us 7 separate groups, with total lost demand for each.6. HAVING
Let's recap what we've learned so far. We know we can use aggregate functions in our SELECT statement, and that we should provide a meaningful column alias. We apply filters to the data using WHERE. And we now know we can split the data into groups using GROUP BY. When we write a WHERE clause, the filtering takes place on the row level - that is, within the data. But, for example, what if we want to sum values based on groups, and then filter on those sums?7. Having in action
As a reminder, here is our existing query and result set. If we wanted to filter the lost_demand column of our results, how would we do it? We've already applied a WHERE clause. We could try adding an additional WHERE clause, but that would only affect the underlying row values, and NOT our overall grouped lost_demand values. How can we filter the results of this query to restrict the results to those where the sum of demand_loss_mw was greater than 1000?8. Having in action (II)
The answer is by adding a HAVING clause, after the GROUP BY clause. By typing HAVING, then SUM, then our desired column name in brackets, and then the condition we want to apply, we arrive at the final result set we need.9. Summary
Here's a quick overview of what we've covered. The main things to remember are that GROUP BY splits your results up into combinations of 1 or more columns - so for example if you wanted to break sales by territory, you would group by territory. Our chosen aggregation functions are then applied to those groups HAVING clauses are applied after the GROUP BY, and are used to either filter on the groups, or to filter using aggregate values such as SUM or AVG (average).10. Let's put our skills to the test!
OK,let's see how we can make our query results more interesting using our new found skills!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.