Optimzing this SQL query-CodePudding

This takes about 2h to load and it's too large for me to use it in my BI tool. The culprit is huge_table. I've added a date partition (ds) to huge_table but it still takes really long to load. Please help me optimize the query below:

        WITH huge.table AS (
             SELECT distinct ORDER_ID, account_id, ds
             FROM "huge.table"
             WHERE yesno_condition='y'   AND days_active>0 )
         
         , CTE2 AS (   SELECT CTE2.EMAIL, CTE2.CONTACT_ID   FROM
         Contact_details_table   JOIN huge.table HT ON
         (CTE2.ORDER_ID=HT.ORDER_ID AND CTE2.account_id=HT.account_id)   Where
         CTE2.EMAIL IN (SELECT NEW_EMAIL as EMAIL FROM maintable
                         UNION ALL 
                         SELECT EMAIL as EMAIL FROM maintable)
             AND HT.ds>= dateadd(year, -2, current_date) --filter date partition of huge.table )
         
         SELECT  
        mt.metric1, 
        mt.metric2, 
        mt.metric3, 
        mt.metric4, 
        mt.metric5,
        mt.metric6, 
        mt.metric7, 
        mt.metric8, 
        mt.metric9, 
        mt.metric10,
        mt.metric11, 
        mt.metric12, 
        mt.metric13, 
        mt.metric14, 
        ot.metric1,
        CTE2.CONTACT_ID 
 
      FROM  maintable as L 
      JOIN CTE2 U ON lower(CTE2.EMAIL)=(case when (mt.EMAIL !=CTE2.EMAIL) then NEW_EMAIL END) 
      JOIN othertable AS ot ON (mt.old_email=ot.email OR mt.new_email=ot.email) 
WHERE ot.exist_condition='Y'  
AND ot.ACCOUNT_TYPE !='inactive' 
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19

CodePudding user response：

I think your biggest issue here is the union all and these subselects. In general I would try to lower burden by making smaller temporary tables. If you have one huge table make it compact step by step. ¨

Try how long will take to execute only for example this part of the query

SELECT CTE2.EMAIL, CTE2.CONTACT_ID   FROM
         Contact_details_table   JOIN huge.table HT ON
         (CTE2.ORDER_ID=HT.ORDER_ID AND CTE2.account_id=HT.account_id)   Where
         CTE2.EMAIL IN (SELECT NEW_EMAIL as EMAIL FROM maintable
                         UNION ALL 
                         SELECT EMAIL as EMAIL FROM maintable)
             AND HT.ds>= dateadd(year, -2, current_date)

If this takes forever make it into one or two separate temporary tables before working with it.

CodePudding user response：

If the table is huge and you are attempting to surround it with a CTE then you will lose all indexes and statistics and it will become a heap. How big is your table? If it is huge, then you could gain traction by declaring a #table and inserting the data and then add appropriate indexes. Then do your query, it is worth a try out.