1. Aggregating with different data granularities
Now let's examine why and when to aggregate data.
2. Data granularity - level of detail
Each row of a table is unique based on the granularity or level of detail. The level of detail can come from one or more columns.
Imagine a database describing video games. One table has an entry for each video game. Game is the lowest level of detail. The specific game name is the column that describes a unique record. Since each game also has a unique id, the id column also describes a unique record.
Another table, the Game_Platforms table, lists the platforms for each game. It shows that the Legend of Zelda was released on the FCDS, GameCube, and Wii. There are multiple rows per game, so game is no longer the table grain. Instead, game and platform together describe each unique row. Two columns then, game_id and platform, are needed to describe the granularity.
3. Joining different granularities
Suppose you want to know the number of platforms for each game and the most recent, or last, platform on which the game was available. You join the Video_Games and Game_Platforms tables.
Focusing on just The Legend of Zelda game, you see it maps to multiple records, or platforms, in the Game_Platforms table.
4. Joining different granularities
Instead of selecting a single game's most recent platform, you have added records for each platform for that game. The Legend of Zelda game went from 1 record in the Video_Games table to the 3 records shown here. The output table is now of a game-platform grain.
5. Setting up a granularity change
To return the results at the games level, you can turn the query into two steps. The first step will set the stage for easy identification of the most recent year's platform, the Wii.
6. Setting up a granularity change
The SQL and output here look similar to what you saw a few slides ago. The next step is to limit this result to the most recent platform, the Wii.
You can then join this Wii limited output to the Video_Games table.
7. CTEs revisited
First, let's take an interlude to revisit CTEs. Recall that a common table expression, or CTE, is a standalone query that returns a temporary results set.
The query is wrapped in a WITH function, and the results can be referenced in later SELECT statements.
CTEs are the key to this two step query that will allow us to join the aggregated results of a table to another table.
8. CTEs to the rescue
The step 1 query is shown here as a CTE called platforms_cte. Placing it into a CTE makes the query output available for the next step.
The subsequent SELECT statement joins the games and platform tables similarly to our initial query. However, there is now an additional join condition. This second condition selects only the single Wii record from the CTE.
9. CTEs to the rescue
The join condition limits the CTE results to the single most recently released platform record. The output is now just one row and still shows the number of platforms and the most recently released platform.
10. Matching data granularity when joining
Joining tables of matching data granularity improves query performance.
No values are repeated. No duplicates are created. Consequently, this reduces the results size to the minimum required number of records. Additionally, it is much easier to avoid double counting when duplicates do not exist.
11. Let's practice!
Now it is your turn to practice aggregating to match data granularities and improve query performance.