Query structure and query execution

1. Query structure and query execution

You just learned how to read and interpret the query planner. Now let's tie the execution plans back to queries by highlighting a few structure choices.

2. Subqueries and joins

Two common structures to both join and limit data are subqueries and joins. As long as the subqueries occur in the SELECT or WHERE clauses, the query planner treats them the same as joins. These examples count the athletes in countries with an annual temperature greater than 22 degrees Celsius using equivalent subquery and join structures.

3. Query plan

The execution plan for both queries is the same. Here, the cost parameters that usually print are replaced by empty parentheses () to make it easier to focus only on the query steps. The order of the query plan is the same in both query structures. Query plans are read from bottom to top, so the first step is a sequential scan on the climate table. This step involves the annual temperature filter. The filtered rows from the climate table are stored in temporary, or hash storage. Next, the athletes table is sequentially scanned. The tables are joined using a hash join. Finally, all the records are aggregated to attain a count.

4. Common table expressions and temporary tables

Common table expressions (CTEs) and temporary tables are additional ways to combine data, especially large data. In terms of the query plan, CTEs are equivalent to temporary tables. These examples again count the athletes in countries with an annual temperature greater than 22 degrees Celsius using equivalent CTE and temporary table structures.

5. Query plan

The execution plan for the CTE query is very similar to the temporary table version, with minor naming differences. The first step is a CTE scan on the celsius table. Unlike in the prior subquery and joins example, the filter condition is not applied in this first scan step. This means that each subsequent step is searching through more records since records of all temperatures remain. After the CTE scan, the records are temporarily stored in the hash step. The athletes table undergoes a sequential scan, followed by a hash join of the two tables. The climate CTE undergoes another sequential scan and here is where the temperature filter is applied. Finally, all the records are aggregated to attain a count.

6. Limiting the data

Because Postgres inherently uses row storage, limiting the number of records to search and return speeds the query performance. The simplest way to limit the data is by adding filter conditions. Filtering on columns with an index results in a faster search. This example looks at athletes in the 2010 and 2014 Olympics.

7. Limiting the data

The query was run before and after an index was created on the year column. As you can see, the query planning and execution are faster when the year column is indexed.

8. Aggregations - different granularities

Joins bring together data. Joining a table that records data at a less granular level will duplicate the values at the lowest granularity. This example uses two tables. The athletes table has an entry for every athlete for every event. The regions table has an entry for every country with athletes. A region can have more than one athlete competitor. This means that joining the tables causes the region data to repeat, or duplicate, for every athlete.

9. Aggregations - changing the granularity

The athletes table has many more rows than the regions table. Aggregating in a CTE prior to joining results in faster execution time. The table shows the query execution times. Joining first, with the different table granularities, executes slower than this CTE aggregation query.

10. Let's practice!

Now that we've reviewed the options in query structure let's practice rearranging some queries and understanding the changes in the query cost and execution.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.