Home > Software design >  Why in my case ST_DWithin is not using index
Why in my case ST_DWithin is not using index

Time:11-30

I'm using Postgis extension for Postgres and trying to optimize my query for searching points in circle.

Consider I have this table with index:

create table position
(
 id bigserial not null primary key,
 date timestamp with time zone,
 point GEOMETRY(Point, 4326),
 alias varchar(50)
);
create index position_point_idx on position using gist (point);

Now when I use query with polygon everything work as expected. In explain plan I can see that query uses index.

SELECT distinct alias
FROM position
WHERE date > '2021-11-28T19:26:18.574Z'
  AND date < '2021-11-28T20:26:18.574Z'
  AND ST_contains(ST_GeomFromText(
                          'POLYGON ((13.970947489142418 49.59174558308953, 13.970947489142418 50.12515341892287, 15.208740681409838 50.12515341892287, 15.208740681409838 49.59174558308953, 13.970947489142418 49.59174558308953))',
                          4326), point);

  ->  Bitmap Index Scan on position_point_idx  (cost=0.00..183.82 rows=5254 width=0) (actual time=5.981..5.981 rows=94462 loops=1)

Okey now I want to search aliases in circle but for some reason it takes seconds and not using index at all.

SELECT distinct alias
FROM position
WHERE
date > '2021-11-28T19:26:18.574Z' AND date < '2021-11-28T20:26:18.574Z'
AND
ST_DWithin (point,ST_GeomFromText('POINT (14.32983409613371 
            49.91815471231952)',4326),62815.14152820495);

ST_DWithin is in list here so it should use index but it's ignoring it. What I'm doing wrong here? Thanks for any hint.

Here is my query plan

HashAggregate  (cost=687537.59..687538.59 rows=100 width=9) (actual time=2874.991..2875.003 rows=100 loops=1)
  Output: alias
"  Group Key: ""position"".alias"
  ->  Gather  (cost=1000.00..686702.70 rows=333955 width=9) (actual time=0.254..2041.354 rows=5008801 loops=1)
        Output: alias
        Workers Planned: 2
        Workers Launched: 2
"        ->  Parallel Seq Scan on public.""position""  (cost=0.00..652307.20 rows=139148 width=9) (actual time=0.021..2117.644 rows=1669600 loops=3)"
              Output: alias
"              Filter: ((""position"".date > '2021-11-28 19:26:18.574 00'::timestamp with time zone) AND (""position"".date < '2021-11-28 20:26:18.574 00'::timestamp with time zone) AND (""position"".point && '0103000020E6100000010000000500000077EC65F919AAEEC0B42AE025A7A5EEC077EC65F919AAEEC03A26ECE821B2EE4077646615AFADEE403A26ECE821B2EE4077646615AFADEE40B42AE025A7A5EEC077EC65F919AAEEC0B42AE025A7A5EEC0'::geometry) AND ('0101000020E61000000100C003E0A82C40520AF71786F54840'::geometry && st_expand(""position"".point, '62815.1415282049493'::double precision)) AND _st_dwithin(""position"".point, '0101000020E61000000100C003E0A82C40520AF71786F54840'::geometry, '62815.1415282049493'::double precision))"
              Rows Removed by Filter: 86028
              Worker 0: actual time=0.023..2492.854 rows=1922778 loops=1
              Worker 1: actual time=0.025..2493.448 rows=2024544 loops=1
Planning Time: 0.211 ms
Execution Time: 2876.783 ms

CodePudding user response:

PostgreSQL chooses a sequential scan because it thinks that that is the most efficient access strategy, and I would say it is right. After all, the WHERE condition removed only 250000 out of approximately 5 million rows.

CodePudding user response:

I think you wanted to use geography, not geometry. In geometry 4326, the entire earth (and the rest of the universe, I suppose) is well within 62815.14152820495 degrees of every other point, so the index would be profoundly ineffective.

If you were using geography, that would be 39 miles, for which the index would be useful, and in my hands it would be used.

The stats on your table also seem to be way off.

  • Related