I have created a view that I use to generate some reports afterwards. When first creating it, I used to have a small amount of data and couldn't notice any issues.
The view now takes forever to load and leads to a lot of performance issues. I tried to limit the view this way:
WHERE date_part('year'::text, e.date) >= date_part('year'::text, CURRENT_DATE - '1 year'::interval);
It unfortunately did not help at all. Is there an efficient way to speed up my query?
CREATE OR REPLACE VIEW public.v_lpr
AS WITH e AS (
SELECT e_1.carpark_id,
e_1.event_time AS date,
e_1.device_type,
e_1.event_type,
e_1.ticket_type,
sum(
CASE
WHEN e_1.count > 0 THEN e_1.count
ELSE 1
END) AS sum_event,
e_1.licenseplatekey
FROM events e_1
GROUP BY e_1.carpark_id, e_1.event_type, e_1.device_type, e_1.ticket_type, e_1.event_time, e_1.licenseplatekey
)
SELECT e.carpark_id,
e.date,
e.device_type,
COALESCE(ddt.description, 'Unknown'::character varying) AS device_type_name,
e.event_type,
COALESCE(det.description, 'Unknown'::character varying) AS event_type_name,
e.ticket_type,
COALESCE(dtt.description, 'Other'::character varying) AS ticket_type_name,
e.sum_event AS count,
e.licenseplatekey,
d.country,
d.state,
d.district,
df.facilityname,
df.tenantname
FROM e
JOIN dim_licenseplate d ON d.key = e.licenseplatekey
JOIN dimfacility df ON df.facilityid = e.carpark_id AND e.date >= date(df.scd_start) AND (e.date < date(df.scd_end) OR df.scd_end IS NULL)
LEFT JOIN dimtype dtt ON dtt.typeid = e.ticket_type AND dtt.mapping_type::text = 'ticket_type'::text
LEFT JOIN dimtype ddt ON ddt.typeid = e.device_type AND ddt.mapping_type::text = 'device_type'::text
LEFT JOIN dimtype det ON det.typeid = e.event_type AND det.mapping_type::text = 'event_type'::text;
Query Plan (https://explain.dalibo.com/plan/lGl)
Hash Left Join (cost=10060558.48..11195587.02 rows=5775196 width=204) (actual time=85580.589..507072.608 rows=6734265 loops=1)
Hash Cond: (e.event_type = det.typeid)
Buffers: shared hit=6373 read=1118955, temp read=1251698 written=1668112
CTE e
-> Finalize GroupAggregate (cost=7492458.49..10060413.80 rows=6764099 width=40) (actual time=85578.911..256822.632 rows=67400480 loops=1)
Group Key: e_1.carpark_id, e_1.event_type, e_1.device_type, e_1.ticket_type, e_1.event_time, e_1.licenseplatekey
Buffers: shared hit=6289 read=1118955, temp read=1251698 written=1253441
-> Gather Merge (cost=7492458.49..9756029.35 rows=13528198 width=40) (actual time=85578.901..211729.452 rows=67431826 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=6289 read=1118955, temp read=1251698 written=1253441
-> Partial GroupAggregate (cost=7491458.47..8193539.61 rows=6764099 width=40) (actual time=84935.312..121437.041 rows=22477275 loops=3)
Group Key: e_1.carpark_id, e_1.event_type, e_1.device_type, e_1.ticket_type, e_1.event_time, e_1.licenseplatekey
Buffers: shared hit=6289 read=1118955, temp read=1251698 written=1253441
-> Sort (cost=7491458.47..7561951.82 rows=28197340 width=36) (actual time=84935.300..103478.557 rows=22548366 loops=3)
Sort Key: e_1.carpark_id, e_1.event_type, e_1.device_type, e_1.ticket_type, e_1.event_time, e_1.licenseplatekey
Sort Method: external merge Disk: 947672kB
Buffers: shared hit=6289 read=1118955, temp read=1251698 written=1253441
-> Result (cost=0.00..1689104.77 rows=28197340 width=36) (actual time=0.066..15116.298 rows=22548366 loops=3)
Buffers: shared hit=6203 read=1118955
-> Append (cost=0.00..1407131.37 rows=28197340 width=36) (actual time=0.065..11417.413 rows=22548366 loops=3)
Buffers: shared hit=6203 read=1118955
-> Parallel Seq Scan on events e_1 (cost=0.00..0.00 rows=1 width=36) (actual time=0.001..0.001 rows=0 loops=3)
-> Parallel Seq Scan on events_2018_01 e_1_1 (cost=0.00..27341.93 rows=538092 width=36) (actual time=0.063..187.542 rows=430474 loops=3)
Buffers: shared hit=128 read=21833
-> Parallel Seq Scan on events_2018_02 e_1_2 (cost=0.00..24481.30 rows=482030 width=36) (actual time=0.448..160.483 rows=385624 loops=3)
Buffers: shared hit=128 read=19533
-> Parallel Seq Scan on events_2018_03 e_1_3 (cost=0.00..27685.44 rows=544444 width=36) (actual time=0.076..167.997 rows=435555 loops=3)
Buffers: shared hit=128 read=22113
-> Parallel Seq Scan on events_2018_04 e_1_4 (cost=0.00..24099.90 rows=473690 width=36) (actual time=0.080..146.110 rows=378952 loops=3)
Buffers: shared hit=128 read=19235
-> Parallel Seq Scan on events_2018_05 e_1_5 (cost=0.00..24558.35 rows=484635 width=36) (actual time=0.095..153.181 rows=387708 loops=3)
Buffers: shared hit=128 read=19584
-> Parallel Seq Scan on events_2018_06 e_1_6 (cost=0.00..26523.80 rows=525080 width=36) (actual time=0.077..175.118 rows=420064 loops=3)
Buffers: shared hit=96 read=21177
-> Parallel Seq Scan on events_2018_07 e_1_7 (cost=0.00..25321.24 rows=502424 width=36) (actual time=0.099..158.335 rows=401939 loops=3)
Buffers: shared hit=96 read=20201
-> Parallel Seq Scan on events_2018_08 e_1_8 (cost=0.00..27389.49 rows=542949 width=36) (actual time=0.116..174.982 rows=434359 loops=3)
Buffers: shared hit=96 read=21864
-> Parallel Seq Scan on events_2018_09 e_1_9 (cost=0.00..26073.71 rows=516771 width=36) (actual time=0.103..166.982 rows=413417 loops=3)
Buffers: shared hit=96 read=20810
-> Parallel Seq Scan on events_2018_10 e_1_10 (cost=0.00..27452.47 rows=545148 width=36) (actual time=0.071..175.492 rows=436118 loops=3)
Buffers: shared hit=96 read=21905
-> Parallel Seq Scan on events_2018_11 e_1_11 (cost=0.00..27293.67 rows=535167 width=36) (actual time=0.072..174.818 rows=428133 loops=3)
Buffers: shared hit=96 read=21846
-> Parallel Seq Scan on events_2018_12 e_1_12 (cost=0.00..27826.48 rows=540748 width=36) (actual time=0.108..177.344 rows=432599 loops=3)
Buffers: shared hit=96 read=22323
-> Parallel Seq Scan on events_2019_01 e_1_13 (cost=0.00..28930.51 rows=572751 width=36) (actual time=0.112..183.352 rows=458201 loops=3)
Buffers: shared hit=96 read=23107
-> Parallel Seq Scan on events_2019_02 e_1_14 (cost=0.00..27237.47 rows=542248 width=36) (actual time=0.072..174.700 rows=433798 loops=3)
Buffers: shared hit=96 read=21719
-> Parallel Seq Scan on events_2019_03 e_1_15 (cost=0.00..29731.53 rows=591053 width=36) (actual time=0.106..192.812 rows=472843 loops=3)
Buffers: shared hit=96 read=23725
-> Parallel Seq Scan on events_2019_04 e_1_16 (cost=0.00..27408.80 rows=545680 width=36) (actual time=0.075..170.280 rows=436544 loops=3)
Buffers: shared hit=96 read=21856
-> Parallel Seq Scan on events_2019_05 e_1_17 (cost=0.00..29647.14 rows=590814 width=36) (actual time=0.094..186.307 rows=472651 loops=3)
Buffers: shared hit=96 read=23643
-> Parallel Seq Scan on events_2019_06 e_1_18 (cost=0.00..27774.97 rows=551497 width=36) (actual time=0.118..178.236 rows=441197 loops=3)
Buffers: shared hit=96 read=22164
-> Parallel Seq Scan on events_2019_07 e_1_19 (cost=0.00..29152.86 rows=580286 width=36) (actual time=0.097..184.164 rows=464229 loops=3)
Buffers: shared hit=96 read=23254
-> Parallel Seq Scan on events_2019_08 e_1_20 (cost=0.00..28940.58 rows=576558 width=36) (actual time=0.072..193.012 rows=461247 loops=3)
Buffers: shared hit=96 read=23079
-> Parallel Seq Scan on events_2019_09 e_1_21 (cost=0.00..28803.81 rows=574881 width=36) (actual time=0.075..185.001 rows=459905 loops=3)
Buffers: shared hit=96 read=22959
-> Parallel Seq Scan on events_2019_10 e_1_22 (cost=0.00..31280.18 rows=622418 width=36) (actual time=0.072..197.873 rows=497934 loops=3)
Buffers: shared hit=96 read=24960
-> Parallel Seq Scan on events_2019_11 e_1_23 (cost=0.00..34005.10 rows=673210 width=36) (actual time=0.084..211.443 rows=538568 loops=3)
Buffers: shared hit=96 read=27177
-> Parallel Seq Scan on events_2019_12 e_1_24 (cost=0.00..34861.76 rows=688076 width=36) (actual time=0.074..228.706 rows=550461 loops=3)
Buffers: shared hit=96 read=27885
-> Parallel Seq Scan on events_2020_01 e_1_25 (cost=0.00..32485.37 rows=644137 width=36) (actual time=0.071..202.208 rows=515309 loops=3)
Buffers: shared hit=96 read=25948
-> Parallel Seq Scan on events_2020_02 e_1_26 (cost=0.00..31654.59 rows=627759 width=36) (actual time=0.075..188.000 rows=502207 loops=3)
Buffers: shared hit=96 read=25281
-> Parallel Seq Scan on events_2020_03 e_1_27 (cost=0.00..20041.85 rows=396985 width=36) (actual time=0.135..122.292 rows=317588 loops=3)
Buffers: shared hit=96 read=15976
-> Parallel Seq Scan on events_2020_04 e_1_28 (cost=0.00..10333.11 rows=205111 width=36) (actual time=0.075..66.141 rows=164089 loops=3)
Buffers: shared hit=96 read=8186
-> Parallel Seq Scan on events_2020_05 e_1_29 (cost=0.00..19341.51 rows=383051 width=36) (actual time=0.074..129.225 rows=306441 loops=3)
Buffers: shared hit=96 read=15415
-> Parallel Seq Scan on events_2020_06 e_1_30 (cost=0.00..24443.33 rows=483733 width=36) (actual time=0.078..162.250 rows=386987 loops=3)
Buffers: shared hit=96 read=19510
-> Parallel Seq Scan on events_2020_07 e_1_31 (cost=0.00..28273.57 rows=561657 width=36) (actual time=0.077..172.975 rows=449326 loops=3)
Buffers: shared hit=96 read=22561
-> Parallel Seq Scan on events_2020_08 e_1_32 (cost=0.00..28377.88 rows=563188 width=36) (actual time=0.107..185.493 rows=450550 loops=3)
Buffers: shared hit=96 read=22650
-> Parallel Seq Scan on events_2020_09 e_1_33 (cost=0.00..29849.43 rows=592443 width=36) (actual time=0.116..180.517 rows=473954 loops=3)
Buffers: shared hit=96 read=23829
-> Parallel Seq Scan on events_2020_10 e_1_34 (cost=0.00..30887.97 rows=613598 width=36) (actual time=0.083..197.719 rows=490878 loops=3)
Buffers: shared hit=96 read=24656
-> Parallel Seq Scan on events_2020_11 e_1_35 (cost=0.00..24920.20 rows=495220 width=36) (actual time=0.106..160.860 rows=396176 loops=3)
Buffers: shared hit=96 read=19872
-> Parallel Seq Scan on events_2020_12 e_1_36 (cost=0.00..20743.41 rows=413641 width=36) (actual time=0.128..128.392 rows=330913 loops=3)
Buffers: shared hit=96 read=16511
-> Parallel Seq Scan on events_2021_01 e_1_37 (cost=0.00..13223.51 rows=262851 width=36) (actual time=0.070..88.114 rows=210281 loops=3)
Buffers: shared hit=96 read=10499
-> Parallel Seq Scan on events_2021_02 e_1_38 (cost=0.00..13419.72 rows=266672 width=36) (actual time=0.075..82.828 rows=213338 loops=3)
Buffers: shared hit=96 read=10657
-> Parallel Seq Scan on events_2021_03 e_1_39 (cost=0.00..20760.23 rows=417123 width=36) (actual time=0.069..132.068 rows=333698 loops=3)
Buffers: shared hit=96 read=16493
-> Parallel Seq Scan on events_2021_04 e_1_40 (cost=0.00..18513.66 rows=371566 width=36) (actual time=0.088..115.614 rows=297253 loops=3)
Buffers: shared hit=96 read=14702
-> Parallel Seq Scan on events_2021_05 e_1_41 (cost=0.00..22992.24 rows=461124 width=36) (actual time=0.078..156.359 rows=368899 loops=3)
Buffers: shared hit=96 read=18285
-> Parallel Seq Scan on events_2021_06 e_1_42 (cost=0.00..32086.47 rows=653047 width=36) (actual time=0.106..211.789 rows=522438 loops=3)
Buffers: shared hit=96 read=25460
-> Parallel Seq Scan on events_2021_07 e_1_43 (cost=0.00..35158.22 rows=717622 width=36) (actual time=0.107..224.169 rows=574097 loops=3)
Buffers: shared hit=96 read=27886
-> Parallel Seq Scan on events_2021_08 e_1_44 (cost=0.00..36240.32 rows=741232 width=36) (actual time=0.076..236.441 rows=592985 loops=3)
Buffers: shared hit=96 read=28732
-> Parallel Seq Scan on events_2021_09 e_1_45 (cost=0.00..36505.14 rows=748614 width=36) (actual time=0.065..230.759 rows=598893 loops=3)
Buffers: shared hit=96 read=28923
-> Parallel Seq Scan on events_2021_10 e_1_46 (cost=0.00..43014.50 rows=894550 width=36) (actual time=0.085..276.106 rows=715643 loops=3)
Buffers: shared hit=96 read=33973
-> Parallel Seq Scan on events_2021_11 e_1_47 (cost=0.00..40632.53 rows=841652 width=36) (actual time=0.077..258.176 rows=673459 loops=3)
Buffers: shared hit=96 read=32120
-> Parallel Seq Scan on events_2021_12 e_1_48 (cost=0.00..42952.06 rows=894306 width=36) (actual time=0.100..268.463 rows=715962 loops=3)
Buffers: shared hit=96 read=33913
-> Parallel Seq Scan on events_2022_01 e_1_49 (cost=0.00..36789.60 rows=765160 width=36) (actual time=0.072..239.944 rows=612128 loops=3)
Buffers: shared hit=96 read=29042
-> Parallel Seq Scan on events_2022_02 e_1_50 (cost=0.00..37959.46 rows=794046 width=36) (actual time=0.078..263.473 rows=635949 loops=3)
Buffers: shared hit=96 read=29923
-> Parallel Seq Scan on events_2022_03 e_1_51 (cost=0.00..1709.01 rows=46601 width=36) (actual time=0.024..8.813 rows=26407 loops=3)
Buffers: shared hit=1243
-> Hash Left Join (cost=142.62..1119690.02 rows=5775196 width=146) (actual time=85580.537..503667.299 rows=6734265 loops=1)
Hash Cond: (e.device_type = ddt.typeid)
Buffers: shared hit=2318 read=376956, temp read=424087 written=839346
-> Hash Left Join (cost=140.55..1104206.82 rows=5775196 width=127) (actual time=85580.506..501171.677 rows=6734265 loops=1)
Hash Cond: (e.ticket_type = dtt.typeid)
Buffers: shared hit=2317 read=376956, temp read=424087 written=839346
-> Hash Join (cost=138.24..1088723.38 rows=5775196 width=108) (actual time=85580.434..498639.758 rows=6734265 loops=1)
Hash Cond: (e.licenseplatekey = d.key)
Buffers: shared hit=2316 read=376956, temp read=424087 written=839346
-> Hash Join (cost=102.45..1073458.17 rows=5775196 width=71) (actual time=85579.980..490328.909 rows=67367271 loops=1)
Hash Cond: (e.carpark_id = df.facilityid)
Join Filter: ((e.date >= date(df.scd_start)) AND ((e.date < date(df.scd_end)) OR (df.scd_end IS NULL)))
Rows Removed by Join Filter: 648583247
Buffers: shared hit=2301 read=376956, temp read=424087 written=839346
-> CTE Scan on e (cost=0.00..135281.98 rows=6764099 width=40) (actual time=85578.915..298598.190 rows=67400480 loops=1)
Buffers: shared hit=2235 read=376956, temp read=424087 written=839346
-> Hash (cost=82.20..82.20 rows=1620 width=55) (actual time=1.026..1.027 rows=1639 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 161kB
Buffers: shared hit=66
-> Seq Scan on dimfacility df (cost=0.00..82.20 rows=1620 width=55) (actual time=0.026..0.648 rows=1639 loops=1)
Buffers: shared hit=66
-> Hash (cost=24.24..24.24 rows=924 width=41) (actual time=0.430..0.431 rows=924 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 76kB
Buffers: shared hit=15
-> Seq Scan on dim_licenseplate d (cost=0.00..24.24 rows=924 width=41) (actual time=0.016..0.211 rows=924 loops=1)
Buffers: shared hit=15
-> Hash (cost=1.95..1.95 rows=29 width=23) (actual time=0.061..0.062 rows=29 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=1
-> Seq Scan on dimtype dtt (cost=0.00..1.95 rows=29 width=23) (actual time=0.015..0.026 rows=29 loops=1)
Filter: ((mapping_type)::text = 'ticket_type'::text)
Rows Removed by Filter: 47
Buffers: shared hit=1
-> Hash (cost=1.95..1.95 rows=9 width=23) (actual time=0.022..0.023 rows=9 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on dimtype ddt (cost=0.00..1.95 rows=9 width=23) (actual time=0.009..0.018 rows=9 loops=1)
Filter: ((mapping_type)::text = 'device_type'::text)
Rows Removed by Filter: 67
Buffers: shared hit=1
-> Hash (cost=1.95..1.95 rows=9 width=23) (actual time=0.041..0.042 rows=9 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on dimtype det (cost=0.00..1.95 rows=9 width=23) (actual time=0.032..0.037 rows=9 loops=1)
Filter: ((mapping_type)::text = 'event_type'::text)
Rows Removed by Filter: 67
Buffers: shared hit=1
Planning time: 4.618 ms
Execution time: 508178.331 ms
Indexes on source table
- Tablename: Events (with partitions e.g. events_2021_01, events_2021_02, etc)
- I also just saw that I have indexes till 2021_07 only, maybe it could be the cause?
CREATE UNIQUE INDEX events_pkey1 ON public.events USING btree (id);
CREATE INDEX idx_duration_entry ON public.events USING btree (card_nr) WHERE ((device_type = 1) AND ((event_type = 2) OR (event_type = 12)) AND ((manufacturer)::text ~~ 'DESIGNA_ABACUS%'::text));
CREATE INDEX idx_duration_exit ON public.events USING btree (card_nr) WHERE ((device_type = 2) AND ((event_type = 2) OR (event_type = 12)) AND ((manufacturer)::text ~~ 'DESIGNA_ABACUS%'::text));
------------------------------------------------------
These indexes are repeated on all partition tables till 2021_07.
CREATE INDEX idx_carpark_events_2018_01 ON public.events_2018_01 USING btree (carpark_id, date_part('year'::text, event_time), date_part('week'::text, event_time), date_part('day'::text, event_time), event_type);
CREATE INDEX idx_events_2018_01 ON public.events_2018_01 USING btree (event_time);
CREATE INDEX idx_events_manufacturer_2018_01 ON public.events_2018_01 USING btree (event_time DESC, manufacturer);
CREATE INDEX idx_carpark_events_2018_02 ON public.events_2018_02 USING btree (carpark_id, date_part('year'::text, event_time), date_part('week'::text, event_time), date_part('day'::text, event_time), event_type);```
**Source Table DDL**
CREATE TABLE public.events (
id int8 NOT NULL,
odb_created_at timestamp NOT NULL,
event_time timestamp NOT NULL,
device_type int4 NOT NULL,
event_type int4 NOT NULL,
ticket_type int4 NOT NULL,
card_nr varchar(100) NULL,
count int4 NOT NULL DEFAULT 1,
manufacturer varchar(200) NULL,
carpark_id int8 NULL,
licenseplatekey int4 NULL,
CONSTRAINT events_pkey1 PRIMARY KEY (id)
);
CREATE INDEX idx_duration_entry ON public.events USING btree (card_nr) WHERE ((device_type = 1) AND ((event_type = 2) OR (event_type = 12)) AND ((manufacturer)::text ~~ 'DESIGNA_ABACUS%'::text));
CREATE INDEX idx_duration_exit ON public.events USING btree (card_nr) WHERE ((device_type = 2) AND ((event_type = 2) OR (event_type = 12)) AND ((manufacturer)::text ~~ 'DESIGNA_ABACUS%'::text));
-- Table Triggers
create trigger events_partition_trigger before
insert
on
public.events for each row execute procedure events_partition_function();
CodePudding user response:
The condition date_part('year'::text, e.date) ...
can't make use of an index on the date
column.
If you only want rows from the current year, use:
WHERE e.date >= date_trunc('year', CURRENT_DATE);
date_trunc('year', CURRENT_DATE)
returns the start of "this year" (January, 1st)
If you want rows from this year and last year you can use:
WHERE e.date >= date_trunc('year', CURRENT_DATE) - interval '1 year';
The OR
condition in the join is also likely a performance killer.
You can avoid the OR in this condition:
e.date >= date(df.scd_start) AND (e.date < date(df.scd_end) OR df.scd_end IS NULL)
by using a daterange that automatically treats null
in the upper end as "unbounded"
e.date <@ daterange(df.scd_start::date, df.scd_end::date)
And possibly create an index on that daterange:
create index on dimfacility
using gist (daterange(scd_start::date, scd_end::date))
If e.date
is not a date
but a timestamp
and the scd_start
and scd_end
are also timestamps, then use a tsrange
:
e.date <@ tsrange(df.scd_start, df.scd_end)
create index on dimfacility
using gist (tsrange(scd_start, scd_end))
CodePudding user response:
Your WHERE clause is not sargable due to the fact that you modify the value in the column.
WHERE date_part('year'::text, e.date) ...
So no index can be used.
Rewrite your WHERE clause to be sargable like this :
WHERE e.date >= CAST(CAST(date_part('year', CURRENT_DATE) -1 AS CHAR(4) ||'-01-01' AS DATE)