Home > Software engineering >  Why Postgres is doing "Index Scan Backward" to get max ID?
Why Postgres is doing "Index Scan Backward" to get max ID?

Time:11-12

I am running a simple query to get the max ID in a table: SELECT max(ID) FROM t WHERE m=345;

Table (t) has 20 million records and 2000 distinct values for m. There is a primary key index on ID and an index on m. For some reason the explain plan shows an "Index Scan Backward" on the pk index rather than scanning the index on m. The query is taking over 10 seconds to complete.

If I prevent the use of the pk index by changing the SQL ( SELECT max(ID 0) FROM t WHERE m=345; ) it takes just a few milliseconds to complete. We do a regular vacuum/analyze on the table. I'd prefer not to add " 0" to all of the queries to solve this issue. I could probably rewrite this SQL in other ways and get a better result but the original SQL is so simple the optimizer should be able to figure out the best plan.

Table and Index DDL:

CREATE TABLE IF NOT EXISTS t
(
    id bigint NOT NULL DEFAULT nextval('t_seq'::regclass),
    c integer,
    m integer,
    b integer DEFAULT '-1'::integer,
    p integer,
    CONSTRAINT t_pkey PRIMARY KEY (id)
)
TABLESPACE pg_default;

CREATE INDEX t_m_index
    ON t USING btree
    (m ASC NULLS LAST)
    TABLESPACE pg_default;

explain plan with workaround and original:

db=> explain (analyze, buffers, format text)
db-> select MAX(id 1) FROM t WHERE m=345;
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=19418.26..19418.27 rows=1 width=8) (actual time=0.047..0.047 rows=1 loops=1)
   Buffers: shared hit=6 read=2
   ->  Bitmap Heap Scan on t  (cost=211.19..19368.82 rows=9888 width=8) (actual time=0.039..0.042 rows=2 loops=1)
         Recheck Cond: (m = 345)
         Heap Blocks: exact=2
         Buffers: shared hit=6 read=2
         ->  Bitmap Index Scan on t_m_index  (cost=0.00..208.72 rows=9888 width=0) (actual time=0.033..0.034 rows=2 loops=1)
               Index Cond: (m = 345)
               Buffers: shared hit=4 read=2
 Planning Time: 0.898 ms
 Execution Time: 0.094 ms
(11 rows)


db=> explain (analyze, buffers, format text)
db-> select MAX(id) FROM t WHERE m=345;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=464.31..464.32 rows=1 width=8) (actual time=21627.948..21627.950 rows=1 loops=1)
   Buffers: shared hit=10978859 read=124309 dirtied=1584
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.56..464.31 rows=1 width=8) (actual time=21627.945..21627.946 rows=1 loops=1)
           Buffers: shared hit=10978859 read=124309 dirtied=1584
           ->  Index Scan Backward using t_pkey on t  (cost=0.56..4524305.43 rows=9756 width=8) (actual time=21627.944..21627.944 rows=1 loops=1)
                 Index Cond: (id IS NOT NULL)
                 Filter: (m = 345)
                 Rows Removed by Filter: 11745974
                 Buffers: shared hit=10978859 read=124309 dirtied=1584
 Planning Time: 0.582 ms
 Execution Time: 21627.964 ms
(12 rows)

CodePudding user response:

It looks to me like your data has a skewed distribution which is causing problems here. You mention in the question:

Table (t) has 20 million records and 2000 distinct values for m.

If the database were to assume an equal distribution of values, that would imply that each value of m would have 10,000 values (20,000,000 / 2,000). But, again assuming equal distribution of data, the optimizer could expect to a record where m = 345 after "randomly" examining a maximum of 2,000 records when walking the t_pkey index. This means that the optimizer has to decide which of the two approaches is likely to be less work:

  1. Scanning the t_pkey index in descending order of ID, filtering out results until it finds a record where m = 345. Once found it is done processing and may return the result to the client.
  2. Scanning the t_m_index with entries matching m = 345. It must wade through all of these results to identify the maximum ID value.

Based on our estimates above, the optimizer might guess that the (vast) majority of the cost of the first plan would be scanning ~2,000 index keys and records, while the cost of the second plan would be in scanning ~10,000 index keys and records plus the cost of aggregating those to find the entry with the maximum value. From that perspective the decision to select the former plan seems reasonable.

The problem is that your data is not uniformly distributed, at least not in terms of ordering based on ID in the t_pkey index. From the second explain plan (which uses that index and represents the first of the two plans I mentioned above) we see:

                 Rows Removed by Filter: 11745974

So the database ended up having to scan through more than half of the data (11.7M rows) in descending order of ID before it found the first record where m = 345. This explains why the operation is so slow. But remember that the plan was chosen because it expected to find a value much sooner, so this poor outcome was a surprise to the database.

One potential thing to consider here would be to create a multicolumn index on both m and ID. This would allow the database to retrieve the requested data after examining just a single index key. It would be most appropriate to consider this approach if this is an operation that you will be running frequently.

CodePudding user response:

Try a BTREE compound index on (m, id DESC) if you need your query to exploit ah index. To satisfy the query, PG random-accesses the index to the first row matching m=345. Then, in that index entry it finds the largest id value and Bob's your uncle. It's O(log n).

Change your index to

CREATE INDEX t_m_id_index
    ON t USING btree
    (m ASC NULLS LAST, id DESC)  -- add the id column
    TABLESPACE pg_default;
  • Related