Get Started

Query lifecycle and the planner

1. Query lifecycle and the planner

Now that you have learned different ways to restructure queries let's relate the structure to the query lifecycle.

2. Basic query lifecycle

The lifecycle has three main parts. The parser works first to check syntax, use stored rules, and translate SQL into machine-readable code. The planner and optimizer take this readable code and find the query tasks. They use database statistics such as row counts to create a query plan. They calculate the costs of each plan to choose the best plan. Finally, the executor returns the query results. It follows the optimized query plan to execute the SQL instructions.

3. Query planner and optimizer

The query planner and optimizer adjust with SQL structure changes so can be impacted by changing your query. The planner generates a series of plan trees with many nodes, or steps. The trees are unique plans that vary the query steps order. Plans are best visualized through the output of the EXPLAIN function. The planner estimates the costs of each plan by using row counts and other metadata in the pg_tables schema. The optimizer chooses the best of these plans. Postgres uses time as its cost metric for optimization.

4. Statistics from pg_tables

The query planner uses the pg_class table and pg_stats view to create its plan and cost estimates. They have an indicator of tables with indexes. They store information such as the number of null values, column width, and distinct values.

5. EXPLAIN

EXPLAIN provides a window into the query plan. Place EXPLAIN before any query to see the steps and cost estimates. EXPLAIN does not run the query; it only provides estimates. The query plan for the simple SELECT * query has one step. It is a sequential scan of the cheese table. This one step includes cost and size estimates.

6. EXPLAIN: Scan

Sequential scan is the single step in this query plan. A sequential scan is a scan of all the rows in a table.

7. EXPLAIN: Cost

The cost estimate of the sequential scan is dimensionless. You use it to compare two structures of a query with the same output. You should not use cost to compare queries with different output. The first cost number is 0. It describes the startup cost. The second number, 10.50, describes the total time. The total time equals the startup time plus the run time.

8. EXPLAIN: Size

The size estimates cover rows and width. Rows provides an estimate of the rows the query examines in order to run. Width provides an estimate of the byte width of the total rows.

9. EXPLAIN with a WHERE clause

The cheese query now uses a WHERE clause to filter to goat and sheep cheeses. The query plan output is again shown using EXPLAIN. The WHERE clause adds a step. Query plans are read bottom to top. The filter step occurs first. After filtering to only goat and sheep records, the planner performs a sequential scan of the cheese table. Using the WHERE clause, the number of rows decreased to 3. The overall cost estimates increased because this is a more complicated query.

10. EXPLAIN with an index

The cheese query was again modified to include an index on the species column. Recall that a column with an index looks the same as a column without an index. The index exists on the back end to help performance. The query plan is now a single index scan step. The Index Condition describes this step, showing that species is the column with an index. The index scan scans the species_index instead of the cheeses table. The start-up cost has increased from zero while the total cost has decreased to 12.66.

11. Let's practice!

Now it's your turn to practice using EXPLAIN to understand the query planner steps.