Partitioning data in a window function
1. Partitioning data in a window function
It's finally time to put the "window" in window function! Let's hop in!2. Ranking data
Previously, we used the `RANK` function to create a result set that looks like this. It assigns a "ranking" to each record in the result set based on the kilometers traveled.3. Ranking data with partitions
But what if we want to do a bit of a different analysis? Let's say we want to assign a ranking to each distance traveled for that specific event. The results would look something like this. Guess what? We can do this!4. PARTITION BY
Enter, `PARTITION BY`. `PARTITION BY` helps us create "windows" of records to apply functions to. A `PARTITION BY` goes before the `ORDER BY` in the `OVER` clause. Here, we're partitioning by `event_name`. This creates a window where we'll rank records. Using `PARTITION BY` is similar to `GROUP BY`; however, it does not collapse records. This is where window functions become quite powerful.5. Ranking data with partitions
Let's say that we want to rank ticket prices for each level of the venue's seating, like this. Now, we're able to do this! In the `OVER` clause, we are partitioning records by `level`. We then `ORDER BY` price in descending order to create our rankings. `PARTITION BY` allows us to create windows, then apply functions to those windows.6. Generating summary metrics with FIRST_VALUE
Not only can we `RANK` records in a window; we can also generate summary metrics. Let's look at two functions; `FIRST_VALUE` and `AVG`. `FIRST_VALUE` will allow us to find the first value in a window. The syntax is the same as we've seen before, but now with `PARTITION BY`. We'll pass the column to return to `FIRST_VALUE`. Then, we'll `PARTITION BY` some field, before ordering by a field to determine the first record.7. Generating summary metrics with AVG
With `AVG`, the logic is quite similar. We'll pass a column to take the average of to `AVG`. Then, we'll `PARTITION BY` our desired field and alias the result. But there's one big difference; we don't need to add an `ORDER BY` when working with `AVG`.8. Customer satisfaction
In this example, there are two important values we want to find; the satisfaction score for each events closest attendee, and the average satisfaction score for each concert. To find the first ranked value, we'll pass `satisfaction_score` to `FIRST_VALUE`. Then, we'll partition by `event_name`, before ordering by `km_traveled`. We'll pass `satisfaction_score` to the `AVG` function and `PARTITION BY` `event_name` to find the average score for each concert. Here are the results!9. Customer satisfaction
Using `PARTITION BY` in this way gives us a meaningful way to compare each satisfaction score to the score of the closest concert goers, as well as the event's average score. And we're just getting started!10. Let's practice!
It's your turn to write a query or two. Good luck!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.