I have a table graphs with a column bbox:
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------------------------- ----------------------------- ----------- ---------- ------------------------------------ ---------- ------------- -------------- -------------
id | integer | | not null | nextval('graphs_id_seq'::regclass) | plain | | |
bbox_diagonale | double precision | | | | plain | | |
bbox | geometry(Polygon,4326) | | | | external | | |
Indexes:
"graphs_pkey" PRIMARY KEY, btree (id)
"bbox_diagonale_idx" btree (bbox_diagonale)
"bbox_index" gist (bbox) CLUSTER
Access method: heap
My goal is to find an entry in the table graphs where the bbox either is the same as the one im searching for or is maximum 1.4 times bigger then the requested one and contains the requested bbox.
To do so I use for example the following sql command:
SELECT id, bbox,BBOX_Diagonale FROM graphs
WHERE (
ST_CONTAINS(bbox,ST_MakeEnvelope( 11.71540516 , 47.77092524 , 12.32288277 , 48.17883335 , 4326))
OR
ST_Equals(bbox,ST_MakeEnvelope( 11.71540516 , 47.77092524 , 12.32288277 , 48.17883335 , 4326)))
AND BBOX_Diagonale <= 91
order by BBOX_Diagonale ASC LIMIT 1;
To accelerate the search I created the following indices:
CREATE INDEX bbox_index ON graphs USING gist(bbox);
CREATE INDEX BBOX_Diagonale_idx ON graphs (
BBOX_Diagonale ASC
);
Here is the explan printout if I run this query:
EXPLAIN (ANALYZE, BUFFERS) SELECT id, bbox,BBOX_Diagonale FROM graphs
WHERE (
ST_CONTAINS(bbox,ST_MakeEnvelope( 11.71540516 , 47.77092524 , 12.32288277 , 48.17883335 , 4326))
OR
ST_Equals(bbox,ST_MakeEnvelope( 11.71540516 , 47.77092524 , 12.32288277 , 48.17883335 , 4326)))
AND BBOX_Diagonale <= 91
order by BBOX_Diagonale ASC LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4476.66..4476.66 rows=1 width=132) (actual time=9.484..9.488 rows=1 loops=1)
Buffers: shared hit=506
-> Sort (cost=4476.66..4477.21 rows=219 width=132) (actual time=9.483..9.486 rows=1 loops=1)
Sort Key: bbox_diagonale
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=506
-> Bitmap Heap Scan on graphs (cost=591.39..4475.56 rows=219 width=132) (actual time=9.454..9.458 rows=1 loops=1)
Recheck Cond: ((st_contains(bbox, '0103000020E61000000100000005000000E62DCB95496E274000BBA2ADADE24740E62DCB95496E274091D7DE02E41648400C2FF3E350A5284091D7DE02E41648400C2FF3E350A5284000BBA2ADADE24740E62DCB95496E274000BBA2ADADE24740'::geometry) OR st_equals(bbox, '0103000020E61000000100000005000000E62DCB95496E274000BBA2ADADE24740E62DCB95496E274091D7DE02E41648400C2FF3E350A5284091D7DE02E41648400C2FF3E350A5284000BBA2ADADE24740E62DCB95496E274000BBA2ADADE24740'::geometry)) AND (bbox_diagonale <= '91'::double precision))
Filter: (st_contains(bbox, '0103000020E61000000100000005000000E62DCB95496E274000BBA2ADADE24740E62DCB95496E274091D7DE02E41648400C2FF3E350A5284091D7DE02E41648400C2FF3E350A5284000BBA2ADADE24740E62DCB95496E274000BBA2ADADE24740'::geometry) OR st_equals(bbox, '0103000020E61000000100000005000000E62DCB95496E274000BBA2ADADE24740E62DCB95496E274091D7DE02E41648400C2FF3E350A5284091D7DE02E41648400C2FF3E350A5284000BBA2ADADE24740E62DCB95496E274000BBA2ADADE24740'::geometry))
Heap Blocks: exact=1
Buffers: shared hit=503
-> BitmapAnd (cost=591.39..591.39 rows=76 width=0) (actual time=9.383..9.385 rows=0 loops=1)
Buffers: shared hit=502
-> BitmapOr (cost=6.69..6.69 rows=216 width=0) (actual time=3.165..3.166 rows=0 loops=1)
Buffers: shared hit=256
-> Bitmap Index Scan on bbox_index (cost=0.00..3.29 rows=108 width=0) (actual time=2.417..2.418 rows=2618 loops=1)
Index Cond: (bbox ~ '0103000020E61000000100000005000000E62DCB95496E274000BBA2ADADE24740E62DCB95496E274091D7DE02E41648400C2FF3E350A5284091D7DE02E41648400C2FF3E350A5284000BBA2ADADE24740E62DCB95496E274000BBA2ADADE24740'::geometry)
Buffers: shared hit=128
-> Bitmap Index Scan on bbox_index (cost=0.00..3.29 rows=108 width=0) (actual time=0.745..0.746 rows=0 loops=1)
Index Cond: (bbox ~= '0103000020E61000000100000005000000E62DCB95496E274000BBA2ADADE24740E62DCB95496E274091D7DE02E41648400C2FF3E350A5284091D7DE02E41648400C2FF3E350A5284000BBA2ADADE24740E62DCB95496E274000BBA2ADADE24740'::geometry)
Buffers: shared hit=128
-> Bitmap Index Scan on bbox_diagonale_idx (cost=0.00..584.39 rows=38117 width=0) (actual time=6.068..6.068 rows=37983 loops=1)
Index Cond: (bbox_diagonale <= '91'::double precision)
Buffers: shared hit=246
Planning:
Buffers: shared hit=247
Planning Time: 28.373 ms
Execution Time: 9.762 ms
The search for bigger bboxes takes long time. Is it possible to optimize this query / table combination ?
UPDATE (slower query)
EXPLAIN (ANALYZE, BUFFERS) SELECT id, bbox,BBOX_Diagonale FROM graphs
WHERE (
ST_CONTAINS(bbox,ST_MakeEnvelope( 9.272461,48.019324,12.700195,51.034486 , 4326))
OR
ST_Equals(bbox,ST_MakeEnvelope( 9.272461,48.019324,12.700195,51.034486, 4326)))
AND BBOX_Diagonale <= 410
order by BBOX_Diagonale ASC LIMIT 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..229.62 rows=1 width=132) (actual time=127.426..127.426 rows=0 loops=1)
Buffers: shared hit=83455
-> Index Scan using bbox_diagonale_idx on graphs (cost=0.42..4513494.64 rows=19692 width=132) (actual time=127.424..127.424 rows=0 loops=1)
Index Cond: (bbox_diagonale <= '410'::double precision)
Filter: (st_contains(bbox, '0103000020E61000000100000005000000F4DE1802808B22409203763579024840F4DE1802808B2240BE1589096A8449403CA583F57F662940BE1589096A8449403CA583F57F6629409203763579024840F4DE1802808B22409203763579024840'::geometry) OR st_equals(bbox, '0103000020E61000000100000005000000F4DE1802808B22409203763579024840F4DE1802808B2240BE1589096A8449403CA583F57F662940BE1589096A8449403CA583F57F6629409203763579024840F4DE1802808B22409203763579024840'::geometry))
Rows Removed by Filter: 89567
Buffers: shared hit=83455
Planning Time: 0.299 ms
Execution Time: 127.457 ms
(9 rows)
CodePudding user response:
It looks like postgis estimates the number of overlapping shapes and uses that estimate as the estimate for st_contains. That is unfortunate for you, as the larger a shape is, the more things that can overlap it, but the fewer things that can contain it. But this estimate is kind of reasonable, as it will need to visit those rows only to then reject them.
When it drastically overestimates the number of rows that matches the st_contains, it thinks it will be better to use the index to read rows already ordered by BBOX_Diagonale and then stop after the first one. So that is what it does, even though it never finds one to stop after. If you rewrite the query to look like ORDER BY BBOX_Diagonale 0
, that would force it not to use that index.