Get startedGet started for free

Partitioning and window functions

1. Partitioning and window functions

PostgreSQL allows us to split a large table into smaller subsets, or partitions.

2. Partitioning

This can be useful when dealing with large datasets as we can separate out only the part we want to query. With time series analysis, it may be the case that we continue to add to the data every day with new information such as the latest temperature or stock prices. We could end up with a very large table! A common way to partition for time series analysis is by range. This segments a set of fields, or columns, representing a range of dates.

3. A range

The boundaries of a range are inclusive at the lower end and exclusive at the upper end. For example, if our first range was from the year 2000 to 2010 and the second range from 2010 to 2020, 2010 would belong to the second range. We can also create a partition by event or category.

4. Partition by

Let's see what this looks like in code. We can create a partitioned table by using the PARTITION BY clause and the method used for partitioning, usually RANGE. Here is an example where we have created a partitioned timetable using the range method on the date_info field which contains the dates the trains are running.

5. Partition of

The next step is to create the actual partitions. This is accomplished with PARTITION OF and FOR VALUES FROM. Here we create a partition for data in the year 2020 only. Our partitioned table, timetable, is already set up to be partitioned on the date_info field. Now, all we need to do is specify the range for our partition. We take a partition of the data from the first of January 2020 to December 31st 2020.

6. Window functions

Let’s switch gears for a moment and talk about window functions. SQL window functions express certain very useful operations more simply than regular queries. A window function returns a value for each row in a table – but this value can depend on other rows. The window describes the set of rows on which the function operates. The value returned for each row can be a value from a row in the “window”, or, a value from a “window function” that uses rows in the window to calculate its value. We can compare them to aggregate functions; only the result is not grouped into one value and we see all of the relevant rows for that group.

7. Over clause

We can spot a window function by the OVER clause; every window function has one. Here we have a regional timetable that has date and time information, including train ids, delays, and the region the train operates in. We use the OVER clause with the PARTITION BY clause to show the train delays per train compared to the average delays per region. In this case, the aggregate function for average is the window function used for each window, the regions. What we see here is the average value for all rows where region is north for the current row, until the region changes and a new average is calculated. If the PARTITION BY clause is not provided, the window function treats the entire table as one partition.

8. Let's practice!

Let's practice.