The pganalyze Postgres Query Planner Presentation
The pganalyze team held a webinar on 2023-03-29 called “How to use the Postgres query planner to debug bad plans and speed up queries”. Here are the main points.
The Planner
The Postgres query planner’s responsibilities:
- find a good query plan to reduce IO and CPU use
- don’t spend much time and memory finding it
- support Postgres extensions
The Postgres query planner does not:
- find all possible query plans
- change a plan when estimations and expectations are incorrect
- keep track of execution performance in order to pick better plans
Query plan estimate example: (cost=0.56..11859.55 rows=10608 width=53)
- In general,
cost
is a number used for an internal decision tree to pick the near-best plan. - The
0.56
is the “startup cost”, which is how much effort it takes to get the first row from the node. This can help for queries withLIMIT
. - The
11859.55
is the “total cost” and involves all the rows, the CPU requirements, and the disk requirements. This is what the query planner is trying to optimize most. - The
rows
are the number of rows output from this path of the plan and is important for joins containing this path. - The
width
is the average row width, used to calculate the final output size.
Poor Estimation
The query plan’s estimate could be wrong if AND
ed clauses have dependencies
with each other. For example, if we have a = 1 AND b = 1
, then the
estimation would be off if no rows would match with both conditions or if a = 1
is always true if b = 1
.
Join estimates (represented as “Nested Loop"s) are complicated and often a bit off if they lack good statistics.
You can improve the query planner’s ability to estimate by generating
statistics with CREATE STATISTICS
.
EXPLAIN (ANALYZE)
can provide information about the actual rows so you can
see how off the rows
are, a spot where you can try to improve the planner’s
estimation or selection. The pganalyze team recommends running ANALYZE
twice
in a row so that you have a better chance to account for caching behavior.
Running ANALYZE
twice in a row gives you a better chance to account for
caching behavior.
Scan/Join
The planner’s main responsibility is to determine the most efficient methods
to scan the tables (base rels) and combine them into the final output (join
rel). If I have tables A
, B
, and C
, postgres will try to figure out the
best join order: (A x B) x C
, (A x C) x B
, or A x (B x C)
. If 12 or more
tables are joined, Postgres uses the Genetic Query Optimizer (GEQO), which
supposedly will make worse join orders than the normal query optimizer.
Parametized Index Scans
Join order matters: it can make the difference between using a Seq Scan or an Index Scan within a nested loop. You can encourage the use of an existing index by:
- adding
WHERE
filters for the table being Seq Scanned - filtering by way of a
JOIN
with another table, which is possible if it can be a Parametized Index Scan (inner side of a nested loop)
Bounded Sorts
A bounded sort is a query wherein you have an ORDER BY
and a LIMIT
. These
can be hugely optimized with a btree index
that includes the sorted field, even if it is an index that includes the
unique key (e.g. id
) and the sorted field (e.g. created_at
). This is
because btrees are already sorted.
It’s important to note that the order of fields in an index is significant as
the btree is presorted for the first and last fields. Any fields included in
an index’s INCLUDE
clause are excluded from sorting and uniqueness
considerations.
auto_explain for older query instances
EXPLAIN (ANALYZE)
gives you information about the data as it exists right
now, which may be different from the state of the data at the time the query
was run before. Perhaps the data is no longer in the cache, or statistics have
since changed, resulting in a different plan.
The pganalyze presenter recommended this configuration for auto_explain
,
which logs plans at the time of the query:
auto_explain.log_min_duration = 1000ms
auto_explain.log_format = json
auto_explain.log_timing = off
auto_explain.log_analyze = on
auto_explain.log_buffers = on
EXPLAIN options
EXPLAIN
on its
own gives you the planner’s chosen plan. You can pass multiple options as its
second parameter. The presentation only covered these:
ANALYZE
: Adds runtime statisticsBUFFERS
: Adds I/O statistics; usually 1 buffer = 8 kB page
Buffers can assist with discovering the impact of dead rows that may need to be vacuumed.
A shared hit
stat within a loop may be exaggerated. Because the same part of
the index may be examined in each iteration, the real shared hits are anywhere
from the cost of a single loop iteration to this max cost.
How to Guide the Planner
To understand the planner’s decisions, first force it to the good plan. Remember that the plan choice was based on cost, which may be poorly estimated, so the good plan may have a larger cost than the chosen bad plan.
Forcing alternative plans on the planner is somewhat indirect. One way to achieve this is by disabling techniques that the planner might typically choose.
For example:
SET enable_seqscan = off
Some session flags you might play with:
enable_seqscan
enable_mergejoin
enable_hashjoin
enable_memoize
Hash and Merge Joins are usually worse in performance than Nested Loops and Index Scans.
You can also use pg_hint_plan
, noted below, as a way to force plan choices
to get a better understanding of the alternative plan cost.
Pinning Plans (Brute Force)
You can use the pg_hint_plan
extension, which allows you to add query hints
through the hint table (hint_plan.hints
) or in this form:
/*+ HashJoin(a b) */ SELECT ...
A more complicated example from the slides:
/*+ Leading((scs schema_tables))
IndexOnlyScan(scs index_schema_column_stats_on_table_id)
IndexScan(schema_tables schema_tables_pkey)
Set (enable_memoize off)
*/
EXPLAIN
SELECT EXISTS (
SELECT 1
FROM schema_column_stats scs
WHERE scs.invalidated_at_snapshot_id IS NULL
AND scs.table_id IN (
SELECT id
FROM schema_tables
WHERE invalidated_at_snapshot_id IS NULL
AND database_id = 12345
)
);
If you are using AWS as your hosting provider, an alternative option is to leverage the Aurora Query Plan Manager (QPM).
The pganalyze team recommends choosing better statistics over planner hints whenever possible.
Optimization Summary
The video summarizes some of the optimization strategies and adds a couple:
- For simple scan selectivity, look into
CREATE STATISTICS
- For join selectivity, try increasing statistics target
- Review cost settings (e.g.
random_page_cost
) - Create multi-column indexes that align with the planner’s biases (e.g. for bounded sorts)
- For complex queries with surprising join order, try forcing materialization (
WITH x AS MATERIALIZED...
) - For multi-tenant apps, consider adding more explicit clauses like
WHERE customer_id = 123