Home > Software design >  Postgres "Index Scan Backward" slower than "Index Scan"
Postgres "Index Scan Backward" slower than "Index Scan"

Time:12-17

I have the following basic table in PostgreSQL 12.13:

Record:
  database_id: FK to database table
  updated_at: timestamp

I've created an index on both the database_id and updated_at fields.

I have a query that fetches the most recent 100 records for a given database id:

SELECT * FROM record WHERE database_id='fa9bcfa6-8d89-4c95-b04a-24c85b169066'
ORDER BY store_record.updated_at DESC
LIMIT 100;

This query is EXTREMELY slow (recently took about 6 min to run). Here is the query plan:

Limit  (cost=0.09..1033.82 rows=100 width=486)
  ->  Index Scan Backward using record_updated_at on record  (cost=0.09..8149369.05 rows=788343 width=486)
        Filter: (database_id = 'fa9bcfa6-8d89-4c95-b04a-24c85b169066'::uuid)

If I change ORDER BY DESC to ORDER BY ASC then the query takes milliseconds, even though the query plan looks about the same:

SELECT * FROM record WHERE database_id='fa9bcfa6-8d89-4c95-b04a-24c85b169066'
ORDER BY store_record.updated_at
LIMIT 100;

Limit  (cost=0.09..1033.86 rows=100 width=486)
  ->  Index Scan using record_updated_at on record  (cost=0.09..8149892.78 rows=788361 width=486)
        Filter: (database_id = 'fa9bcfa6-8d89-4c95-b04a-24c85b169066'::uuid)

If I remove the ORDER BY completely then the query is also fast:

SELECT * FROM record WHERE database_id='fa9bcfa6-8d89-4c95-b04a-24c85b169066'
LIMIT 100;

Limit  (cost=0.11..164.75 rows=100 width=486)
  ->  Index Scan using record_database_id on record  (cost=0.11..1297917.10 rows=788366 width=486)
        Index Cond: (database_id = 'fa9bcfa6-8d89-4c95-b04a-24c85b169066'::uuid)

Few questions:

  • Why is the first query so much slower than the other two? I understand why the last one is faster but I don't understand why changing the ORDER BY DESC to ORDER BY makes such a difference. Amy I missing an index?
  • How can I speed up the initial query?

CodePudding user response:

The plan follows the index to read records in order of updated_at DESC, testing each for the desired database_id, and then stops as soon as it finds 100 which have the desired database_id. But that specific desired value of database_id is much more common on one end of the index than the other. There is nothing magically about the DESC here, presumably there is some other value of database_id for which it works the opposite way, finding 100 of them when descending much faster than when ascending. If you had done EXPLAIN (ANALYZE), this would have been immediately clear based on the different reported values for "Rows Removed by Filter"

If you have a multicolumn index on (database_id,updated_at), then it can immediately jump to the desired spot of the index and read the first 100 rows it finds (after "filtering" them for visibility), and that would work fast no matter which direction you want the ORDER BY to go.

  • Related