Get startedGet started for free

Aggregating with date/time series

1. Aggregating with date/time series

When counting observations by month or day, the result only includes rows for values that appear in your data. How do you find periods of time with no observations?

2. Generate series

Recall the generate_series function, which you used with numeric data. The same function can be used with date/time data. generate_series expects timestamps for the from and to arguments. Dates will automatically be cast to a timestamp. The last argument is an interval. For example, here we have an interval of two days. The result is a series of timestamps between the start and end values separated by the interval.

3. Generate series

Here's an example with an interval of hours. The last value in the series will be less than or equal to the ending timestamp specified. For example, here the series ends at 8pm on January 1st, because the next value in the series would be greater than the 0th hour of January 2nd.

4. Generate series from the beginning

To get consistent values, generate series using the beginning of a month or year, not the end. For example, attempting to generate a series for the last day in each month produces unexpected results. When you add one month to January 31st, you get the last day in February, the 28th, because there is no 31st. But then 1 month after February 28th is March 28th, not March 31st.

5. Generate series from the beginning

To correctly generate a series for the last day of each month, generate a series using the beginning of each month, then subtract 1 day from the result.

6. Normal aggregation

Series can also be used to find units of time with no observations. For example, you might want to count sales by the hour of the day they occurred. Here's some sample sales data in its original form. Then with the number of sales counted by hour. Looking at the counts, it's hard to tell at a glance that there were no sales in the 11 o'clock hour.

7. Aggregation with series

To include hours with no sales, generate a series of hours, and then join this to the original data to introduce rows for the missing hours. First, use a WITH clause to create the series of hours from 9am to 2pm and call this hour_series. Then, join this to the sales data, matching the hour from the series to the sales date truncated to the hour. Count the date column, instead of counting the rows, because we don't want to count null values. Group and order by hours to get the count of sales per hour.

8. Aggregation with series: result

The result now includes all hours between 9am and 2pm, with zeros for hours with no sales. We're less likely to overlook that some hours have no sales.

9. Aggregation with bins

If you want to aggregate data by an interval that is not equal to one unit of a date/time field, you can create bins. Recall this strategy from working with numeric data. Let's count sales in 3 hour intervals during the day. First, create two series, one for the lower bound of each bin and one for the upper. The series for the upper bound starts and ends 3 hours after the lower bound. This is the amount of the interval. We alias the result as bins. Then, join bins to the sales data, where the sales date is greater than or equal to the lower bin and less than the upper bin. Then group and order by the bin bounds.

10. Bin result

The result is the count of sales made during each of the three hour intervals.

11. Practice generating series!

OK, time to practice using series with date/time data.