Get startedGet started for free

Aggregations

1. Aggregations

One area where BigQuery shines is in using aggregations to summarize data. This section will review common aggregation functions in SQL.

2. Aggregations in BigQuery

Aggregations are operations that summarize large amounts of data into smaller, more concise results. This makes them essential for many data analysis tasks, such as understanding trends, identifying patterns, and making informed decisions. This query shows us some of the most common aggregations used. BigQuery's scalability, columnar storage, and distributed processing architecture make it ideal for performing these queries.

3. GROUP BY and ORDER BY

In BigQuery, GROUP BY and ORDER BY are essential for organizing and sorting aggregated queries. Here, we group sales data by the 'order-id' and calculate the total sales within each category. The GROUP BY clause organizes the data, while ORDER BY arranges the results in descending order based on total cost. These clauses are fundamental for summarizing and arranging aggregated data to produce results required by many common reports.

4. COUNT

As its name implies, "count" will count the number of rows returned by the query. Here, we count the number of records within each category from our product dataset. COUNT is fundamental for understanding dataset sizes and distributions, aiding in various analytical processes within BigQuery.

5. SUM and AVG

SUM and AVG are two aggregate functions that work with numeric data to calculate sums and averages in aggregate. In this query, we calculate each category's total and average sales from the 'sales-data' table. SUM computes the total sales, while AVG calculates the average sales. These functions are fundamental for understanding the overall magnitude and average values within specific categories, aiding in comprehensive data analysis.

6. MIN and MAX

MIN and MAX allow us to find the highest or lowest value in an aggregated query. In this query, we retrieve the minimum and maximum photo counts in the "ecomm-products" table. MIN extracts the product with the lowest number of photos, while MAX fetches the highest. These functions are vital for understanding the range of values within categories, aiding in identifying outliers, and understanding the dataset's boundaries.

7. COUNTIF

The COUNTIF function in BigQuery enables us to count records based on specified conditions within a dataset. Let's look at this example. Using COUNTIF with the condition 'cost greater than 500,' we count the number of records within each category where cost is over $500. This function is valuable for deriving counts based on specific criteria, allowing us to assess the frequency of occurrences that meet the query conditions.

8. HAVING

The HAVING clause in BigQuery filters grouped data based on specified conditions, acting like a WHERE clause but for aggregated data. Here, we utilize HAVING to filter the results to show results for categories with an average item price over $75. This clause is crucial for filtering aggregated data based on aggregated conditions.

9. ANY_VALUE

In BigQuery, the ANY_VALUE function fetches an arbitrary value from a set of grouped data, often used when a single value is needed from non-aggregated columns in a grouped query. Using ANY_VALUE with 'category,' we retrieve an arbitrary product category from each group within the 'category' column in the 'ecomm-products' table. By adding a HAVING clause into ANY_VALUE, we can find the MIN or MAX value from the range of data using an additional column; in this case, the query will return the category with highest cost.

10. Let's practice!

With that let's practice some of these aggregated queries!