Get startedGet started for free

Using windows for statistical functions

1. Using windows for calculating statistics

In this lesson, you will use windows functions to calculate the standard deviation and mode of a column.

2. Calculating the standard deviation

Calculating the standard deviation is a common task in data science as it is important to understand the statistical distributions of the numeric columns. Using windows functions, you can either calculate the standard deviation for the entire table or for each window using the STDEV function.

3. Calculating the standard deviation for the entire table

Here we calculate the standard deviation of the CurrentQuota column for the entire table. Since no columns are listed in OVER, we did not include PARTITION BY, only one window is created for the entire table. As a result, only one value is repeated across the entire StandardDev column.

4. Calculating the standard deviation for each partition

In this example, the OVER clause is partitioning and ordering by SalesYear, thus creating one window for each unique value in SalesYear. As a result, the standard deviation is calculated for each window. When SalesYear changes, the standard deviation also changes.

5. Calculating the mode

Now let's look at how to calculate the mode. The mode is a value which appears most often in your data. Unlike standard deviation, there is no function to calculate mode. To calculate the mode, you first need to create a common table expression containing an ordered count of values using ROW_NUMBER. Then, write a query using a CTE to pick the value with the highest row number. Let's do this to calculate the mode of CurrentQuota!

6. Calculating the mode in T-SQL (I)

First, we create a CTE. We do this by creating windows by partitioning and ordering by CurrentQuota and assigning a row number for each row in each windows. As you can see from the CTE here, only seventy thousand appears twice, so it has two row numbers.

7. Calculating the mode in T-SQL (II)

You can use this CTE to write a query to return the value with the highest row number, as shown here. This query uses a WHERE clause to filter the data to only show values equal to the maximum value in QuotaList. The result is seventy thousand since seventy thousand occurs the most number of times, twice in CurrentQuota.

8. Let's practice!

It's time to calculate the standard deviation and mode by yourself!