Windowing in Redshift
1. Windowing in Redshift
Now, let's explore the concept of windowing in Amazon Redshift.2. Windows functions
Window functions in Amazon Redshift are a powerful tool for performing calculations on a specific subset of rows in our query result. Unlike aggregate functions, window functions return a value for each row in the result set based on a window or partition of rows defined by the OVER clause. There are three main concepts to understand when working with window functions: partitioning, ordering, and framing. Partitioning divides the result set into windows or partitions. Ordering defines the order of rows within each window. Optional framing adds additional conditions to restrict the rows included in the window.3. Using windowing to calculate an average
Let's look at an example of windowing to calculate each division's average revenue within each year and month. We're using the AVG function OVER our window, which is PARTITIONed BY the data by division_id, year, and month. Using these fields for our window allows us to calculate the average revenue for each division in each specific month.4. Using windowing to calculate an average (cont)
Here are the results of our query. We can see how windowing allows us to perform calculations considering specific data windows, such as the division, year, and month.5. Using lag for month over month windows
Sometimes, we want to compare data across windows. LAG and LEAD are two functions that allow us to do just that. LAG helps us access data from any row before our current row in the same window, and LEAD will enable us to access data after our current row. The ORDER BY clause helps us determine what is before and after our row. In this example, we're using the LAG function to access data from the previous records. We want to know how many sales we had for the current month and how many in the prior month within each division. So first, we're counting the sales and then using lag to count the sales 1 row before ours. Since we are partitioned by division and ordered by year and month, we end up with a window that could count all the recordings within the division ordered by year and month.6. Using lag for month over month windows (cont)
As we look at the rest of the query, we can see that we're grouping by division, year, and month to match our partitions. We aggregate our counts for both the row and the prior row down to a specific month and then order them to make it easier to follow. Now, let's look at the results.7. Using lag for month over month windows (results)
We can see that the group by clause gave us one row for each division, year, and month. Then, the normal count did what we expected, and the lag window function was also grouped in the same way, giving us the prior year and month by our partition, which was division. We can see how the LAG function allows us to access data from the previous row in our window, providing valuable insights into month-over-month changes.8. Ranking data within windows
If we wanted to rank all the sales by division according to the highest revenue, we could use the RANK function, which ranks data within a window based on specific criteria. We can partition by the division and order by revenue in descending order.9. Ranking data within windows (results)
Each sale within each division is ranked based on its revenue, with the highest revenue sale receiving a rank of one. This shows how window functions like RANK can provide valuable insights into the relative performance of data within partitions.10. Let's practice!
Time for you to build some windows of your own.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.