Basic count on a large table on PostgreSQL 14 with 64GB Ram & 20 threads. Storage is an NVME disk.
Questions:
- How do I improve the query for this select count query? What kind of optimizations should I look into on Postgres configuration?
- The workers planned is 4 but launched 0, is that normal?
EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM public.product;
Finalize Aggregate (cost=2691545.69..2691545.70 rows=1 width=8) (actual time=330901.439..330902.951 rows=1 loops=1)
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
-> Gather (cost=2691545.27..2691545.68 rows=4 width=8) (actual time=330901.342..330902.861 rows=1 loops=1)
Workers Planned: 4
Workers Launched: 0
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
-> Partial Aggregate (cost=2690545.27..2690545.28 rows=1 width=8) (actual time=330898.747..330898.757 rows=1 loops=1)
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
-> Parallel Index Only Scan using points on products (cost=0.57..2634234.99 rows=22524114 width=0) (actual time=0.361..222958.361 rows=90993600 loops=1)
Heap Fetches: 46261956
Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
I/O Timings: read=36692.273 write=6548.923
Planning:
Buffers: shared hit=39 read=8
I/O Timings: read=0.398
Planning Time: 2.561 ms
JIT:
Functions: 4
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.691 ms, Inlining 104.789 ms, Optimization 24.169 ms, Emission 22.457 ms, Total 152.107 ms
Execution Time: 330999.777 ms
CodePudding user response:
The workers planned is 4 but launched 0, is that normal?
It can happen when too many concurrent transactions compete for a limited number of allowed parallel workers. The manual:
The number of background workers that the planner will consider using is limited to at most
max_parallel_workers_per_gather
. The total number of background workers that can exist at any one time is limited by bothmax_worker_processes
andmax_parallel_workers
. Therefore, it is possible for a parallel query to run with fewer workers than planned, or even with no workers at all. The optimal plan may depend on the number of workers that are available, so this can result in poor query performance. If this occurrence is frequent, consider increasingmax_worker_processes
andmax_parallel_workers
so that more workers can be run simultaneously or alternatively reducingmax_parallel_workers_per_gather
so that the planner requests fewer workers.
You can also optimize overall performance to free up resources, or get better hardware (in addition to ramping up max_parallel_workers
).
What's also troubling:
Heap Fetches: 46261956
For 90993600 rows. That's way too many for comfort. An index-only scan is not supposed to do that many heap fetches.
Both of these symptoms would indicate massive concurrent write access (or long-running transactions hogging resources and keeping autovacuum
from doing its job). Look into that, and/or tune per-table autovacuum
settings for table product
to be more aggressive, so that columns statistics are more valid and the visibility map can keep up. See:
Also, with halfway valid table statistics, a (blazingly fast!) estimate might be good enough? See: