Home > Mobile >  Multi column order by kills query performance even when the time range does not contain any records
Multi column order by kills query performance even when the time range does not contain any records

Time:12-13

I have a fairly small table of 26 million records.

CREATE TABLE t1
(
   cam         varchar(100)    NOT NULL,
   updatedat     timestamp,
   objid      varchar(40)     NOT NULL,
   image         varchar(100)    NOT NULL,
   reader      varchar(60)     NOT NULL,
   imgcap        timestamp       NOT NULL
);

ALTER TABLE t1
   ADD CONSTRAINT t1_pk
   PRIMARY KEY (reader, cam, image, objid, imgcap);

I have a simple query to iterate the records between a time range.

SELECT * FROM t1  
WHERE updatedat >= '2021-12-09 20:30:00'  and updatedat <= '2021-12-09 20:32:01'
ORDER BY reader ASC , imgcap ASC, objid ASC, cam ASC, image ASC
LIMIT 10000 
OFFSET 0;

I added an index to support the query with the comparison as the left most field and the remaining elements to support the sort.

CREATE INDEX t1_idtmp ON t1 USING btree (updatedat , reader , imgcap , objid, cam, image);

However, the query takes more than 10 seconds to get complete. It takes same time even if there are no elements in the range.

  ->  Incremental Sort  (cost=8.28..3809579.24 rows=706729 width=223) (actual time=11034.114..11065.710 rows=10000 loops=1)
        Sort Key: reader, imgcap, objid, cam, image
        Presorted Key: reader, imgcap
        Full-sort Groups: 62  Sort Method: quicksort  Average Memory: 42kB  Peak Memory: 42kB
        Pre-sorted Groups: 62  Sort Methods: top-N heapsort, quicksort  Average Memory: 58kB  Peak Memory: 58kB
        ->  Index Scan using t1_idxevtim on t1  (cost=0.56..3784154.75 rows=706729 width=223) (actual time=11033.613..11036.823 rows=10129 loops=1)
              Filter: ((updatedat >= '2021-12-09 20:30:00'::timestamp without time zone) AND (updatedat <= '2021-12-09 20:32:01'::timestamp without time zone))
              Rows Removed by Filter: 25415461
Planning Time: 0.137 ms
Execution Time: 11066.791 ms

There are couple of more indexes on table to support other use cases.

CREATE INDEX t1_idxua ON t1 USING btree (updatedat);
CREATE INDEX t1_idxevtim ON t1 USING btree (reader, imgcap);

I think, Postgresql wants to avoid an expensive sort and thinks that pre sorted key will be faster but why does Postgresql not use the t1_idtmp index as both search & sort can be satisfied with it?

CodePudding user response:

why does Postgresql not use the t1_idtmp index as both search & sort can be satisfied with it?

Because the sort can't be satisfied by it. An btree index on (updatedat , reader , imgcap , objid, cam, image) can only produce data ordered by reader , imgcap , objid, cam, image for within ties of updatedat. So if your condition was for a specific value of updatedat, that would work. But since it is for a range of updatedat, that won't work as they are not all tied with each other.

  • Related