Get startedGet started for free

Applying statistical aggregates to time series data

1. Applying statistical aggregates to time series data

Welcome to this video on applying statistical aggregates to Time Series Data.

2. Statistical aggregates

Functions such as MIN, MAX, SUM, AVG, and COUNT aggregate data. We can take this one step further with statistical aggregates which includes metrics such as means and medians.

3. Calculating the average

Here we have a query that calculates the average number of views per 20-minute sample interval by averaging the views and casting them to an integer.

4. Average number of views per day

Suppose we want the average number of views per hour instead. Here we have a similar query, but now the common table expression gives us the data per day by converting the time granularity. This way the query can calculate the average number of views per day per time series.

5. Average number of views per day

The result is the average number of views per day for each news article.

6. Average number of views per day

Another way to average the number of views per day per news article is use COUNT DISTINCT. Note that the change of time granularity is embedded in the COUNT DISTINCT operation.

7. Discrete and continuous medians

Now let's turn to statistical aggregates. The discrete median picks the first value that is closest to the middle of the dataset, always returning a value from the dataset. The continuous median returns a cut point that divides the dataset into two; giving an average of the two innermost values if the dataset is even length. Here is an example using an odd and even number of data points. The discrete medians would be three and two, respectively, and the continuous medians would be three and two and a half, respectively.

8. Ordered-set aggregate functions

These median calculations can be done using PERCENTILE_DISC, and PERCENTILE_CONT. Here we demonstrate each function on two small data series having five elements. Each item in the VALUES clause contains an id and value. The median is being calculated on the second element, the value, of each item. Because each data series is of odd length, the median is given by the middle item. Notice we have used the WITHIN GROUP clause to order a subset of the data we wish to find medians for. Combining the aggregate functions with this clause creates an ordered-set aggregate function.

9. Ordered-set aggregate functions

Here is the result. PERCENTILE_CONT returns a float, whereas PERCENTILE_DISC returns a value from the table.

10. Median, quantile, percentile, quartile

What we have been doing is creating quantiles of our data. A median is a type of percentile, which is a type of quantile. A quantile divides a sample into almost equal sized subsets. Common quantiles have special names, such as quartiles (four subsets), and deciles (ten subsets).

11. Calculating quartiles

This query calculates the quartiles for a small sample using PERCENTILE_DISC. There is always one fewer quantile than the number of groups created. This means that there are three quartiles. To obtain the first quartile we pass in the fraction zero point two-five. The second quartile corresponds to the median, which we obtain by passing in the fraction zero point five. We obtain the third quartile by passing in the fraction zero point seven-five.

12. Calculating an array of discrete quartiles

We can calculate all three quartiles in a single call by passing in the quartile fractions as an array. An array is like a list of values. It is specified with square brackets. The return result is an array. The same can be done for the continuous version.

13. Let's practice!

Let's apply what we've learned.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.