1. Filtering with HAVING
In this lesson, we'll examine the differences between WHERE and HAVING and some best practices for how and how not to use them.
2. HAVING processing order
HAVING, like WHERE is also used to filter a query. Although they both occur after FROM and before SELECT in the processing order, that's where the similarity stops.
3. Grouping with WHERE row filter
Remember from a previous lesson, WHERE is used to filter individual rows. HAVING, on the other hand, is used to filter grouped or aggregated rows. Here is an example from the PlayerStats table. I want to know the sum of total points, by Team, but restricted to shooting guards. We use WHERE to filter just for rows where the Position equals SG. This works because the filter condition is on individual rows.
4. Row filtering with HAVING
But, we're grouping by Team. I could also use HAVING to restrict rows to only shooting guards. This query produces the same results, but it is not the correct way to use HAVING. In this case, WHERE is more efficient because rows are filtered first before grouping and summing. Using HAVING in this way would filter the individual rows after grouping, therefore, unnecessarily tying up resources and potentially increasing the time it takes for a query to run. Don't use HAVING to filter individual or ungrouped rows.
5. Aggregating by group
So what is the point of HAVING? HAVING is generally used to apply a numeric aggregate filter on grouped rows. Let's say we want a query that returns all rebound statistics by Team, where team offensive rebounds are 1000 or more. First, we create a query that calculates rebound statistics grouped by team.
6. Group filtering with WHERE
We have the column ORebound, for offensive rebounds already, therefore, we can apply a WHERE filter so that ORebound is greater than or equal to 1000. The query looks right and runs without error. No teams meet that condition. But, take a closer look at the query. This filter is not the one we want. We want a filter for Team offensive rebounds not individual players which is what this one does. I am pretty confident no single player had 1000 or more offensive rebounds in a season. We need to use HAVING because the filter is on a grouped numeric aggregate.
7. Without an aggregate function
Can we substitute WHERE for HAVING and use the same filter condition? No. Merely doing this will raise an error when we run the query. A numeric column that is using a HAVING filtering condition must be enclosed in an aggregate function. Also, note here the position of HAVING in this query, it is always after GROUP BY.
8. With an aggregate function
When applied correctly, as in this example, we enclose the filtering column in the SUM() aggregate function. One team meets the condition - incidentally, the team I support. Go OKC!!
9. Summary
Let's summarize what we've learned. Do not use HAVING to filter individual or ungrouped rows. Use WHERE to filter individual rows and HAVING for a numeric filter on grouped rows. We can only apply HAVING to a numeric column in an aggregate function filter.
10. Let's practice!
Let's practice.