Get startedGet started for free

Working with LAG() and LEAD()

1. Working with LAG() and LEAD()

The LAG() and LEAD() functions give us the ability to link together past, present, and future in the same query.

2. The LAG() window function

Let's take a look first at the LAG() function. LAG() gives you a prior row in a window given a particular partition strategy and ordering. As you can see here, it looks a lot like an aggregate window function, where it takes in as a parameter an expression. In this case, my expression is the NumberOfVisits column. We then partition by customer ID and order by the month start date. The result is a table which includes the current row as well as the prior record's row. We don't have any data for customer 1 prior to December of 2018, so the prior column is NULL.

3. The LEAD() window function

The LEAD() window function is just like LAG() except that it looks at the next record instead of the prior record. Our query is exactly the same as the prior example except for changing LAG() to LEAD() and switching the order of columns. The results, as you might expect, return the current number of visits and the next month's number of visits per customer. Once we get to the end of the data set, the final record's next month value will be NULL because we will have reached the end of the window.

4. Specifying number of rows back

Both LAG() and LEAD() take an optional second parameter which represents the number of rows back to look. In this example, we include two LAG() functions, one looking two rows back and the other looking one row back. The result is that each row has up to three numbers, representing the last three months of visitation data. This can help management and report viewers see short-term trends, such as the jump between December and January.

5. Windows and filters

Before we get into the exercises, I would like to clarify one last thing around windows. Let's say we have a simple SQL query like this one, where we get a lagged value for each date. Our end result looks like this table, where we have one row per day. As expected, the first day's prior value is NULL. Then, let's add a filter where the date must be greater than January 2nd. The end result gives us one row and its prior value is NOT 6. Instead, it is NULL. The reason for this outcome is that LAG() and LEAD() execute after the WHERE clause, so we've already thrown out the records with data for January 1st and 2nd, leaving January 3rd as our first available date. Without those thrown-out records, we don't have any knowledge of what the prior value was.

6. Windows and filters and CTEs

If you do want to preserve these prior values, there are a few ways we can do this. One method is to use a common table expression. In our query, we run LAG() against the entire data set so that we have every prior value. We'll call that result "records" and reference it in the main query, where we subsequently filter out values earlier than January 3rd. The end result is that we still get one row back, but this time we know the prior value as well as the current value.

7. Let's practice!

Let's now try out a few exercises using LAG() and LEAD().