Get startedGet started for free

Top items with window functions

1. Top items with window functions

In this video, we’ll learn how to obtain two or more top items per group using a window function.

2. Table description

Here is a reminder of our temperatures_monthly table. It contains temperature measurements aggregated on a monthly basis. The table has the following fields: station_id - the id of the weather station year_month - year and month t_monthly_min - monthly minimum temperature in Celsius t_monthly_max - monthly maximum temperature in Celsius t_monthly_avg - monthly average temperature in Celsius

3. Traditional aggregation

Let’s first see how we can use traditional aggregation to find the top items. For each station id, let's find the lowest monthly temperature for the year 2018 and attach the year and month for each low temperature. That cannot be done in the subquery GROUP BY clause, so we wrap the subquery in a second query, using a JOIN to locate the date associated with the low temperature found by the subquery. This is quite a bulky query!

4. Traditional aggregation

Here's what that result looks like. This query is problematic because it does not have a result for station five. That is expected because station five did not provide data for this year. It also returns two rows for station three, because two months share the same minimum temperature in 2018 for that station. Another limitation is that this approach gives us only the coldest single month of the year. For example, suppose we wanted the two coldest months. Window functions can address these issues more easily.

5. Using row_number

First, we number each row, for each station. We start the row numbering over when moving from one station to the next. We do this by adding a ROW_NUMBER function to the previous query. This function numbers the rows per partition. Inserting it in our query numbers each station's rows in ascending order.

6. Numbering the rows

Here are the first 15 rows of the result. Observe how the numbering starts over with station two. That is because the window function partitioned the data by station id.

7. Lowest temperatures

Now we can obtain the two lowest monthly temperatures for the year, for each station, by wrapping the previous query in an outer query that constrains the row number using a WHERE clause. We were also able to obtain the date column without having to use a JOIN.

8. Lowest temperatures

Here we see the result for the stations one, two and three. Our output has the two lowest monthly temperatures for 2018 for each station. The two coldest months always seem to fall in December, January, or February, which makes sense!

9. Highest values

What if we wanted to rank temperatures in the reverse order? Simply use descending order instead by adding the appropriate keyword to the ORDER BY clause.

10. Let's practice!

Let's cement what we just learned.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.