Home > OS >  PostgreSQL query slow when adding EXTRACT ISODOW
PostgreSQL query slow when adding EXTRACT ISODOW

Time:04-22

I have a large table(117 899 162 rows) the query below is very slow but if I remove EXTRACT(ISODOW FROM l2.starttime) IN (1) the performance is alot better. I guess a index is skipped when adding isodow. Is there any way to improve this query?

SELECT l2.* FROM ListenerActivity l1 JOIN ListenerActivity l2 ON l1.id != l2.id 
WHERE l2.starttime BETWEEN '2022-04-17T14:00:00' AND '2022-04-19T21:00:00' 
AND l1.starttime BETWEEN '2022-04-17T14:00:00' AND '2022-04-19T21:00:00' 
AND l1.memberid = l2.memberid 
AND l1.station != l2.station 
AND l1.station = 928 
AND (l2.starttime - l1.endtime) = INTERVAL '1 second'
AND EXTRACT(ISODOW FROM l2.starttime) IN (1)

Execution plan without isodow:

"Merge Join  (cost=36594.99..36611.22 rows=1 width=47) (actual time=518.095..650.036 rows=1172 loops=1)"
"  Merge Cond: (l1.memberid = l2.memberid)"
"  Join Filter: ((l1.id <> l2.id) AND (l1.station <> l2.station) AND ((l2.starttime - l1.endtime) = '00:00:01'::interval))"
"  Rows Removed by Join Filter: 191168"
"  Buffers: shared hit=74251, temp read=7202 written=4341"
"  ->  Sort  (cost=11946.67..11946.89 rows=433 width=24) (actual time=199.983..200.720 rows=8785 loops=1)"
"        Sort Key: l1.memberid"
"        Sort Method: quicksort  Memory: 1071kB"
"        Buffers: shared hit=3435"
"        ->  Bitmap Heap Scan on listeneractivity l1  (cost=11082.90..11942.88 rows=433 width=24) (actual time=195.014..197.932 rows=8785 loops=1)"
"              Recheck Cond: ((starttime >= '2022-04-17 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone) AND (station = 928))"
"              Heap Blocks: exact=464"
"              Buffers: shared hit=3435"
"              ->  BitmapAnd  (cost=11082.90..11082.90 rows=433 width=0) (actual time=194.946..194.947 rows=0 loops=1)"
"                    Buffers: shared hit=2971"
"                    ->  Bitmap Index Scan on listeneractivity_starttime_idx  (cost=0.00..56.66 rows=14273 width=0) (actual time=14.607..14.607 rows=253370 loops=1)"
"                          Index Cond: ((starttime >= '2022-04-17 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone))"
"                          Buffers: shared hit=294"
"                    ->  Bitmap Index Scan on listeneractivity_station_idx  (cost=0.00..11026.15 rows=3580024 width=0) (actual time=179.723..179.723 rows=3417368 loops=1)"
"                          Index Cond: (station = 928)"
"                          Buffers: shared hit=2677"
"  ->  Sort  (cost=24648.32..24655.45 rows=14273 width=47) (actual time=310.901..378.990 rows=414039 loops=1)"
"        Sort Key: l2.memberid"
"        Sort Method: external sort  Disk: 17360kB"
"        Buffers: shared hit=70816, temp read=5770 written=4341"
"        ->  Index Scan using listeneractivity_starttime_idx on listeneractivity l2  (cost=0.11..24451.34 rows=14273 width=47) (actual time=0.018..112.574 rows=253370 loops=1)"
"              Index Cond: ((starttime >= '2022-04-17 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone))"
"              Buffers: shared hit=70815"
"Planning:"
"  Buffers: shared hit=20"
"Planning Time: 0.230 ms"
"Execution Time: 652.318 ms"

Execution plan with isodow:

"QUERY PLAN"
"Nested Loop  (cost=11060.88..25732.92 rows=1 width=47) (actual time=1011.050..45688.906 rows=175 loops=1)"
"  Join Filter: ((l1.id <> l2.id) AND (l1.station <> l2.station) AND (l1.memberid = l2.memberid) AND ((l2.starttime - l1.endtime) = '00:00:01'::interval))"
"  Rows Removed by Join Filter: 304104051"
"  Buffers: shared hit=50748"
"  ->  Bitmap Heap Scan on listeneractivity l1  (cost=11060.77..11556.41 rows=249 width=24) (actual time=265.118..280.547 rows=6922 loops=1)"
"        Recheck Cond: ((starttime >= '2022-04-18 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone) AND (station = 928))"
"        Heap Blocks: exact=341"
"        Buffers: shared hit=3208"
"        ->  BitmapAnd  (cost=11060.77..11060.77 rows=249 width=0) (actual time=263.834..263.835 rows=0 loops=1)"
"              Buffers: shared hit=2867"
"              ->  Bitmap Index Scan on listeneractivity_starttime_idx  (cost=0.00..34.54 rows=8214 width=0) (actual time=8.036..8.036 rows=163144 loops=1)"
"                    Index Cond: ((starttime >= '2022-04-18 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone))"
"                    Buffers: shared hit=190"
"              ->  Bitmap Index Scan on listeneractivity_station_idx  (cost=0.00..11026.15 rows=3580024 width=0) (actual time=254.729..254.729 rows=3417368 loops=1)"
"                    Index Cond: (station = 928)"
"                    Buffers: shared hit=2677"
"  ->  Materialize  (cost=0.11..14115.28 rows=41 width=47) (actual time=0.000..1.998 rows=43933 loops=6922)"
"        Buffers: shared hit=47540"
"        ->  Index Scan using listeneractivity_starttime_idx on listeneractivity l2  (cost=0.11..14115.24 rows=41 width=47) (actual time=0.028..151.353 rows=43933 loops=1)"
"              Index Cond: ((starttime >= '2022-04-18 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone))"
"              Filter: (date_part('isodow'::text, starttime) = '1'::double precision)"
"              Rows Removed by Filter: 119211"
"              Buffers: shared hit=47540"
"Planning:"
"  Buffers: shared hit=20"
"Planning Time: 0.241 ms"
"Execution Time: 45689.757 ms" 

image of below info

id | integer | 
audiencetype | character varying(255) | 
endtime | timestamp without time zone | 
inoutofhome | character varying(255) | 
starttime | timestamp without time zone | 
memberid | bigint | 
station | integer |
Indexes: 
   "listeneractivity_pkey" PRIMARY KEY, btree (id) 
   "listeneractivity_endtime_idx" btree (endtime) 
   "listeneractivity_starttime_idx" btree (starttime) 
   "listeneractivity_station_idx" btree (station) 
Foreign-key constraints: 
   "fkewxhyebhex19kpytnanu9yq1s" FOREIGN KEY (memberid) REFERENCES member(id) 
   "listeneractivity_fk_1" FOREIGN KEY (memberid) REFERENCES member(id) 
   "listeneractivity_fk_2" FOREIGN KEY (station) REFERENCES local_station(id)

CodePudding user response:

Can you try this query?

I moved all the expressions that depend on l2 to the ON-clause.

The expression l2.starttime BETWEEN '2022-04-17T14:00:00' AND '2022-04-19T21:00:00' seems not needed because of the (l2.starttime - l1.endtime) = INTERVAL '1 second'

SELECT 
   l2.* 
FROM ListenerActivity l1 
JOIN ListenerActivity l2 ON l1.id != l2.id 
                        and l2.starttime BETWEEN l1.endtime - '1 second' and l1.endtime   '1 second'
                        and l1.memberid = l2.memberid
                        and l1.station != l2.station
                        and EXTRACT(ISODOW FROM l2.starttime) IN (1)
WHERE  
      l1.starttime BETWEEN '2022-04-17T14:00:00' AND '2022-04-19T21:00:00' 
  AND l1.station = 928 
;

Please share execution time of this query, and if possible edit this answer, and add the execution plan

EDIT:

When you do these 2 steps:

  1. l2.starttime BETWEEN '2022-04-17T14:00:00' AND '2022-04-19T21:00:00'
  2. EXTRACT(ISODOW FROM l2.starttime) IN (1)

Then you basically say the start time should be between April17 and Arpil19, AND it should be a monday....

I think it might be quicker to just question the Monday (in stead of the earlier mentioned 2 steps).

l2.starttime BETWEEN '2022-04-18T00:00:00' AND '2022-04-18T22:59:59'

EDIT2:

Adding this index might help:

CREATE INDEX ON ListenerActivity (station,starttime,memberid);

CodePudding user response:

Both queries are slow and both remove many rows from the join filters. That means that they can't filter by just using the index.

You have to try some, I would start with this one:

CREATE INDEX ON ListenerActivity (starttime, station, memberid, id, EXTRACT(ISODOW FROM starttime));

But a different order of the columns in the index might work better. Just give it a try.

After you created an index, use EXPLAIN(ANALYZE, VERBOSE, BUFFERS) to get the query plan used. You might have to drop the current indexes to force the database to use the new one. This one is the bigger one and because of that the database might think it's not the best one.

Creating additional statistics for the combination of columns might also help the query planner.

  • Related