Get startedGet started for free

Time between events

1. Time between events

You know how to subtract one date from another. But how do you find out how much time has passed between events, when the dates or timestamps are all saved in the same column?

2. The problem

For example, here is data from a sales table with a timestamp for each sale. Our question is: how much time passes on average between each sale?

3. Lead and lag

The lead and lag functions let us offset the ordered values in a column by 1 row by default. Then we can subtract the original values from the lead or lag of the values to get the difference between events. Before we talk about the syntax, let's look at the results of the function calls. The lag function pushes all of the values down one row. NULL is inserted at the beginning of the lag column,so that the first sales time,at 9:07, is now the second value. The last sales time is discarded. The lead function does the opposite,pulling all values up one row. The second sales time of 9:13 becomes the first value,and NULL is added at the end of the lead column. The first sales time is discarded.

4. Lead and lag

Okay, back to the syntax. For the lead and lag functions to work, you have to specify how rows should be ordered. Remember that the rows in a database table have no inherent order to them - they are only ordered when you explicitly specify an order. Lead and lag are window functions. You start with the function name, and supply the column you want to apply the lead or lag to as the argument. You then add an "over" clause with the keyword OVER and an "order by" statement specifying how the rows should be ordered. The "order by" statement goes in parentheses. In this example, we have ordered by the same column that we want to lead and lag: date, but this isn't a requirement. We'll see an example where these columns are different shortly.

5. Time between events

But first, how do we use lead and lag to compute the time between sales? If we order dates from oldest to newest, we want to subtract the lagged date from the current date to compute the gap between each sale and the previous sale. We have one less gap value than the number of sales.

6. Average time between events

To compute the average gap, we need to use a subquery. We cannot simply wrap the average function around the difference between sales because window functions can't be used inside aggregation functions like average. The average time between sales is 32 minutes and 15 seconds.

7. Change in a time series

The lead and lag functions are not limited to date/time data. As mentioned before, you can order the rows of the table by one column while getting the lead or lag of a different column. We often want to do this to compute changes in a time series. A time series is any variable that has a date or time associated with each value. Here, we want to see not how much time passes between each sale, but how the amount sold changes from one sale to the next. We can't compute a change for the first value in a time series because there is no previous value. Looking at the change column, for the second sale, the amount was 19 less than the first sale. The last sale was 35 more than the previous sale.

8. On to the exercises!

Alright, with that, it's once again time to practice. On to the exercises.