1. Manipulating the granularity of time series data
Let’s take a look at examples of time series data.
2. Time series data
Time series data is a type of date and time data that is ordered. A time series comes from data collected over a period of time and is often represented as a line graph.
We see time series data quite often in our every day life, some of the most common examples are things like stock prices, or temperature measurements.
3. Time series data: train times
Here is a simple time series table giving the time at which a train travels through its stops.
4. Multivariate time series
A time series can be multivariate or have more than one variable, such as the minimum, maximum, and average temperatures at a weather station.
5. Time granularity
Time series can also include details of various precision. This is known as time granularity.
Time granularity is the level of detail the data has. More granularity usually means more precision with the date and time measurements. Common granularities include seconds, minutes, hours, days, weeks, months, quarters, and years.
Here is a time series that is collected at a less common time granularity. This page views data was collected at 20-minute intervals.
6. Changing time granularity
In time series analysis, it may be more helpful to see the data in a different granularity than is currently provided by the data.
Luckily, the DATE_TRUNC function can alter time series data to several levels of granularity, including but not limited to century, decade, quarter, microseconds and more.
DATE_TRUNC accepts two or three arguments. The first is called the field, which is the granular value we wish to have, followed by the source, the time series data we are altering. The third argument is an optional time zone argument. The default is to use the current time zone setting, but adding this third argument will give us the option to change it.
7. Changing time granularity
Here is an example where we have changed the 20-minute interval data to hourly. We have specified we want to extract the hour from the ts source field. Since we are grouping all views by the hour, we need to also include an aggregate function, in this case we have summed all of the views of the 20-minute intervals in each hour.
As we can see, the behavior of DATE_TRUNC gives us a trimmed, or truncated, result. 9:20 becomes 9 o'clock.
8. Extracting time granularity
Another function that works similarly is DATE_PART, however, instead of trimming the data it extracts the specific data we need.
It works in two ways. The first is using the arguments of field, like in DATE_TRUNC, and a timestamp.
The second is using the arguments field, and interval.
The EXTRACT function can also be used to grab precise date and time data and is often recommended instead of DATE_PART. This is because DATE_PART returns a result in the type of double precision. This is an inexact numeric type with 15 decimal digit precision and may sometimes give us imprecise results.
9. Hour granularity
Let's review some examples using EXTRACT and DATE_PART.
Suppose we wanted to look at the number of news article views by hour of day.
Here we use DATE_PART and specify hour as our field, or granular value we want to extract, and ts as the source we are extracting from. We've aggregated again using SUM to group the views by the hour.
10. Day of week granularity
Here’s one more example showing how to do this at day of week granularity, and this time with EXTRACT. Using the dow argument assigns a number to each day of week, ranging from Sunday(0) to Saturday(6).
11. Let's practice!
Let's practice.