I have a fairly small table of 26 million records.
CREATE TABLE t1
(
cam varchar(100) NOT NULL,
updatedat timestamp,
objid varchar(40) NOT NULL,
image varchar(100) NOT NULL,
reader varchar(60) NOT NULL,
imgcap timestamp NOT NULL
);
ALTER TABLE t1
ADD CONSTRAINT t1_pk
PRIMARY KEY (reader, cam, image, objid, imgcap);
I have a simple query to iterate the records between a time range.
SELECT * FROM t1
WHERE updatedat >= '2021-12-09 20:30:00' and updatedat <= '2021-12-09 20:32:01'
ORDER BY reader ASC , imgcap ASC, objid ASC, cam ASC, image ASC
LIMIT 10000
OFFSET 0;
I added an index to support the query with the comparison as the left most field and the remaining elements to support the sort.
CREATE INDEX t1_idtmp ON t1 USING btree (updatedat , reader , imgcap , objid, cam, image);
However, the query takes more than 10 seconds to get complete. It takes same time even if there are no elements in the range.
-> Incremental Sort (cost=8.28..3809579.24 rows=706729 width=223) (actual time=11034.114..11065.710 rows=10000 loops=1)
Sort Key: reader, imgcap, objid, cam, image
Presorted Key: reader, imgcap
Full-sort Groups: 62 Sort Method: quicksort Average Memory: 42kB Peak Memory: 42kB
Pre-sorted Groups: 62 Sort Methods: top-N heapsort, quicksort Average Memory: 58kB Peak Memory: 58kB
-> Index Scan using t1_idxevtim on t1 (cost=0.56..3784154.75 rows=706729 width=223) (actual time=11033.613..11036.823 rows=10129 loops=1)
Filter: ((updatedat >= '2021-12-09 20:30:00'::timestamp without time zone) AND (updatedat <= '2021-12-09 20:32:01'::timestamp without time zone))
Rows Removed by Filter: 25415461
Planning Time: 0.137 ms
Execution Time: 11066.791 ms
There are couple of more indexes on table to support other use cases.
CREATE INDEX t1_idxua ON t1 USING btree (updatedat);
CREATE INDEX t1_idxevtim ON t1 USING btree (reader, imgcap);
I think, Postgresql wants to avoid an expensive sort and thinks that pre sorted key will be faster but why does Postgresql not use the t1_idtmp index as both search & sort can be satisfied with it?
CodePudding user response:
why does Postgresql not use the t1_idtmp index as both search & sort can be satisfied with it?
Because the sort can't be satisfied by it. An btree index on (updatedat , reader , imgcap , objid, cam, image)
can only produce data ordered by reader , imgcap , objid, cam, image
for within ties of updatedat. So if your condition was for a specific value of updatedat, that would work. But since it is for a range of updatedat, that won't work as they are not all tied with each other.