Optimizing SELECT count(*) on large table-CodePudding

Basic count on a large table on PostgreSQL 14 with 64GB Ram & 20 threads. Storage is an NVME disk.

Questions:

How do I improve the query for this select count query? What kind of optimizations should I look into on Postgres configuration?
The workers planned is 4 but launched 0, is that normal?

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM public.product;

Finalize Aggregate  (cost=2691545.69..2691545.70 rows=1 width=8) (actual time=330901.439..330902.951 rows=1 loops=1)
  Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
  I/O Timings: read=36692.273 write=6548.923
  ->  Gather  (cost=2691545.27..2691545.68 rows=4 width=8) (actual time=330901.342..330902.861 rows=1 loops=1)
        Workers Planned: 4
        Workers Launched: 0
        Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
        I/O Timings: read=36692.273 write=6548.923
        ->  Partial Aggregate  (cost=2690545.27..2690545.28 rows=1 width=8) (actual time=330898.747..330898.757 rows=1 loops=1)
              Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
              I/O Timings: read=36692.273 write=6548.923
              ->  Parallel Index Only Scan using points on products  (cost=0.57..2634234.99 rows=22524114 width=0) (actual time=0.361..222958.361 rows=90993600 loops=1)
                    Heap Fetches: 46261956
                    Buffers: shared hit=1963080 read=1140455 dirtied=1908 written=111146
                    I/O Timings: read=36692.273 write=6548.923
Planning:
  Buffers: shared hit=39 read=8
  I/O Timings: read=0.398
Planning Time: 2.561 ms
JIT:
  Functions: 4
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 0.691 ms, Inlining 104.789 ms, Optimization 24.169 ms, Emission 22.457 ms, Total 152.107 ms
Execution Time: 330999.777 ms

CodePudding user response：

The workers planned is 4 but launched 0, is that normal?

It can happen when too many concurrent transactions compete for a limited number of allowed parallel workers. The manual:

The number of background workers that the planner will consider using is limited to at most max_parallel_workers_per_gather. The total number of background workers that can exist at any one time is limited by both max_worker_processes and max_parallel_workers. Therefore, it is possible for a parallel query to run with fewer workers than planned, or even with no workers at all. The optimal plan may depend on the number of workers that are available, so this can result in poor query performance. If this occurrence is frequent, consider increasing max_worker_processes and max_parallel_workers so that more workers can be run simultaneously or alternatively reducing max_parallel_workers_per_gather so that the planner requests fewer workers.

You can also optimize overall performance to free up resources, or get better hardware (in addition to ramping up max_parallel_workers).

What's also troubling:

Heap Fetches: 46261956

For 90993600 rows. That's way too many for comfort. An index-only scan is not supposed to do that many heap fetches.

Both of these symptoms would indicate massive concurrent write access (or long-running transactions hogging resources and keeping autovacuum from doing its job). Look into that, and/or tune per-table autovacuum settings for table product to be more aggressive, so that columns statistics are more valid and the visibility map can keep up. See:

Aggressive Autovacuum on PostgreSQL

Also, with halfway valid table statistics, a (blazingly fast!) estimate might be good enough? See:

Fast way to discover the row count of a table in PostgreSQL