Home > Blockchain >  Partial index not used on count
Partial index not used on count

Time:07-14

Hello and thanks in advance to anyone who will take some time to answer me. This is a question to learn how to use indexes efficiently on count queries.

Version Postgresql 12.11

I have a huge table lots (~15M rows) which should have been partitioned by status (an integer column with actual values between 0 and 3), but it has not.

Here is how the data is distributed on it:

-- lots by status
SELECT status, count(*), ROUND((count(*) * 100 / SUM(count(*)) OVER ()), 1) AS "%"
  FROM lots
GROUP BY status
ORDER BY count(*) DESC;
status count %
2 ~13.3M 90%
0 ~1.5M 10%
1 ~6K ~0%
NULL ~0.5K ~0%

I also have those indexes on it:

tablename indexname num_rows table_size index_size unique number_of_scans tuples_read tuples_fetched
lots index_lots_on_status 1.4742644e 07 5024 MB 499 MB N 3451 7060928281 134328966
lots pidx_active_lots_on_id 1.4742644e 07 5024 MB 38 MB Y 23491795 1496103827 2680228

where the pidx_active_lots_on_id is a partial index defined as follow:

CREATE UNIQUE INDEX CONCURRENTLY "pidx_active_lots_on_id" ON "lots" ("id" DESC) WHERE status = 0;

As you can see, the partial index on lots with status = 0 is "only" 38MB (against the 0.5GB of the full status index).

I've introduced the latter index to try to optimise this query:

SELECT count(*) FROM lots WHERE status = 0;

because the count of the lots on status 0 is the most common count case for that table, but for some reason the index seems to be ignored.

I also tried to perform a more specific query:

SELECT count(id) FROM lots WHERE status = 0;

with this second query, the index is used, but with worst results.

NOTE: I also ran an ANALYSE lots; after the introduction of the index.

My questions are:

  1. Why is the index partial index ignored on the first count case (count(*))?
  2. Why is the second query performing worst?

Detail on plan:

EXPLAIN(ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT COUNT(*) FROM lots WHERE lots.status = 0

Aggregate  (cost=539867.77..539867.77 rows=1 width=8) (actual time=16517.790..16517.791 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=79181 read=287729 dirtied=16606 written=7844
  I/O Timings: read=14040.416 write=58.453
  ->  Index Only Scan using index_lots_on_status on public.lots  (cost=0.11..539125.83 rows=1483881 width=0) (actual time=0.498..16238.580 rows=1501060 loops=1)
        Output: status
        Index Cond: (lots.status = 0)
        Heap Fetches: 1545139
        Buffers: shared hit=79181 read=287729 dirtied=16606 written=7844
        I/O Timings: read=14040.416 write=58.453
Planning Time: 1.856 ms
JIT:
  Functions: 3
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 0.466 ms, Inlining 80.076 ms, Optimization 15.797 ms, Emission 12.393 ms, Total 108.733 ms
Execution Time: 16568.670 ms



EXPLAIN(ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT COUNT(id) FROM lots WHERE lots.status = 0

Aggregate  (cost=660337.71..660337.72 rows=1 width=8) (actual time=32127.686..32127.687 rows=1 loops=1)
  Output: count(id)
  Buffers: shared hit=80426 read=334949 dirtied=3 written=75
  I/O Timings: read=11365.273 write=22.365
  ->  Bitmap Heap Scan on public.lots  (cost=11304.17..659595.77 rows=1483887 width=4) (actual time=3783.122..30680.836 rows=1501176 loops=1)
        Output: id, url, title, ... *(list of all of the 32 columns)*
        Recheck Cond: (lots.status = 0)
        Heap Blocks: exact=402865
        Buffers: shared hit=80426 read=334949 dirtied=3 written=75
        I/O Timings: read=11365.273 write=22.365
        ->  Bitmap Index Scan on pidx_active_lots_on_id  (cost=0.00..11229.97 rows=1483887 width=0) (actual time=2534.845..2534.845 rows=1614888 loops=1)
              Buffers: shared hit=4866
Planning Time: 0.248 ms
JIT:
  Functions: 5
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 1.170 ms, Inlining 56.485 ms, Optimization 474.508 ms, Emission 205.882 ms, Total 738.045 ms
Execution Time: 32169.349 ms

CodePudding user response:

The partial index may be smaller, but the size of the part of the index that needs to be read will be about the same between the two indexes. Skipping the parts of the complete index which are for the wrong "status" will be very efficient.

Your table is obviously not very well vacuumed, based on both the heap fetches and the number of buffers dirtied and written. Try vacuuming the table and then repeating the queries.

  • Related