This takes about 2h to load and it's too large for me to use it in my BI tool. The culprit is huge_table
. I've added a date partition (ds) to huge_table but it still takes really long to load. Please help me optimize the query below:
WITH huge.table AS (
SELECT distinct ORDER_ID, account_id, ds
FROM "huge.table"
WHERE yesno_condition='y' AND days_active>0 )
, CTE2 AS ( SELECT CTE2.EMAIL, CTE2.CONTACT_ID FROM
Contact_details_table JOIN huge.table HT ON
(CTE2.ORDER_ID=HT.ORDER_ID AND CTE2.account_id=HT.account_id) Where
CTE2.EMAIL IN (SELECT NEW_EMAIL as EMAIL FROM maintable
UNION ALL
SELECT EMAIL as EMAIL FROM maintable)
AND HT.ds>= dateadd(year, -2, current_date) --filter date partition of huge.table )
SELECT
mt.metric1,
mt.metric2,
mt.metric3,
mt.metric4,
mt.metric5,
mt.metric6,
mt.metric7,
mt.metric8,
mt.metric9,
mt.metric10,
mt.metric11,
mt.metric12,
mt.metric13,
mt.metric14,
ot.metric1,
CTE2.CONTACT_ID
FROM maintable as L
JOIN CTE2 U ON lower(CTE2.EMAIL)=(case when (mt.EMAIL !=CTE2.EMAIL) then NEW_EMAIL END)
JOIN othertable AS ot ON (mt.old_email=ot.email OR mt.new_email=ot.email)
WHERE ot.exist_condition='Y'
AND ot.ACCOUNT_TYPE !='inactive'
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
CodePudding user response:
I think your biggest issue here is the union all and these subselects. In general I would try to lower burden by making smaller temporary tables. If you have one huge table make it compact step by step. ¨
Try how long will take to execute only for example this part of the query
SELECT CTE2.EMAIL, CTE2.CONTACT_ID FROM
Contact_details_table JOIN huge.table HT ON
(CTE2.ORDER_ID=HT.ORDER_ID AND CTE2.account_id=HT.account_id) Where
CTE2.EMAIL IN (SELECT NEW_EMAIL as EMAIL FROM maintable
UNION ALL
SELECT EMAIL as EMAIL FROM maintable)
AND HT.ds>= dateadd(year, -2, current_date)
If this takes forever make it into one or two separate temporary tables before working with it.
CodePudding user response:
If the table is huge and you are attempting to surround it with a CTE then you will lose all indexes and statistics and it will become a heap. How big is your table? If it is huge, then you could gain traction by declaring a #table and inserting the data and then add appropriate indexes. Then do your query, it is worth a try out.