I would need help in re-writing query to improve performance. I think below query is slow because of OR part and two sub queries for products
table (so twice scanning) for the each key in category
table.
SELECT key
FROM category c
WHERE (1=1)
AND ( ((EXISTS (SELECT * from products p
WHERE p.attribute_key=2
AND p.category_key=c.key
AND ((value && CAST(ARRAY['Active', 'active'] AS text[])))
AND p.status='active'
))
OR(NOT EXISTS(SELECT * from products p
WHERE p.attribute_key=2
AND p.category_key=c.key
AND p.status='active')
))
)
AND c.status='active'
AND c.type_key=4
expected output is (9 rows) below
key
1
13
3
6
2
7
4
10
15
Query returns keys from category table if type_key=4 and category.status='active' and
products table has value='Active' or 'active' and attribute_key=2 and status='active. (this is
EXISTS
part in query)even if products table doesn't have any attribute_key or has attribute_key!=2 records for type_key=4. (this is
OR( NOT EXISTS)
part in query) Example: type_key 13 and 3
Query explain analyze plan is below. indexes are used well.
I hope query can be improved by writing another way or changing OR part in query
sample data is here dbfiddle
Aggregate (cost=3156220.03..3156220.04 rows=1 width=8) (actual time=86100.329..86100.342 rows=1 loops=1)
-> Index Scan using category_type_key_status on category c (cost=0.43..3155906.64 rows=125355 width=0) (actual time=12.618..85925.747 rows=120852 loops=1)
Index Cond: ((type_key = 4) AND (status = 'active'))
Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (NOT (SubPlan 3)))
Rows Removed by Filter: 86879
SubPlan 1
-> Index Scan using products_category_key_attribute_key_status on products p (cost=0.56..8.59 rows=1 width=0) (actual time=0.332..0.332 rows=1 loops=207731)
Index Cond: ((category_key = c.key) AND (attribute_key = 2) AND (status = 'active'))
Filter: (value && '{Active,active}'::text[])
Rows Removed by Filter: 0
SubPlan 2
-> Gather (cost=1000.00..1155110.30 rows=8916 width=4) (never executed)
Workers Planned: 2
Workers Launched: 0
-> Parallel Seq Scan on products p_1 (cost=0.00..1153218.70 rows=3715 width=4) (never executed)
Filter: ((value && '{Active,active}'::text[]) AND (attribute_key = 2) AND (status = 'active'))
SubPlan 3
-> Index Only Scan using products_category_key_attribute_key_status on products p_2 (cost=0.56..8.58 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=86933)
Index Cond: ((category_key = c.key) AND (attribute_key = 2) AND (status = 'active'))
Heap Fetches: 11497
Planning Time: 35.808 ms
Planning Time: 35.808 ms
CodePudding user response:
Both of the predicates on the products table are almost exactly the same. You can simply join this table once and use OR to apply either of the conditions:
SELECT c.key
FROM category c
LEFT OUTER
JOIN products p
ON p.category_key = c.key
AND p.status = 'active'
AND p.attribute_key = 2
WHERE c.status='active'
AND c.type_key=4
AND ( p.category_key IS NULL -- NOT EXISTS
OR ((value && CAST(ARRAY['Active', 'active'] AS text[])))) -- or value matches