Home > Enterprise >  Optimzing this SQL query
Optimzing this SQL query

Time:08-13

This takes about 2h to load and it's too large for me to use it in my BI tool. The culprit is huge_table. I've added a date partition (ds) to huge_table but it still takes really long to load. Please help me optimize the query below:

        WITH huge.table AS (
             SELECT distinct ORDER_ID, account_id, ds
             FROM "huge.table"
             WHERE yesno_condition='y'   AND days_active>0 )
         
         , CTE2 AS (   SELECT CTE2.EMAIL, CTE2.CONTACT_ID   FROM
         Contact_details_table   JOIN huge.table HT ON
         (CTE2.ORDER_ID=HT.ORDER_ID AND CTE2.account_id=HT.account_id)   Where
         CTE2.EMAIL IN (SELECT NEW_EMAIL as EMAIL FROM maintable
                         UNION ALL 
                         SELECT EMAIL as EMAIL FROM maintable)
             AND HT.ds>= dateadd(year, -2, current_date) --filter date partition of huge.table )
         
         SELECT  
        mt.metric1, 
        mt.metric2, 
        mt.metric3, 
        mt.metric4, 
        mt.metric5,
        mt.metric6, 
        mt.metric7, 
        mt.metric8, 
        mt.metric9, 
        mt.metric10,
        mt.metric11, 
        mt.metric12, 
        mt.metric13, 
        mt.metric14, 
        ot.metric1,
        CTE2.CONTACT_ID 
 
      FROM  maintable as L 
      JOIN CTE2 U ON lower(CTE2.EMAIL)=(case when (mt.EMAIL !=CTE2.EMAIL) then NEW_EMAIL END) 
      JOIN othertable AS ot ON (mt.old_email=ot.email OR mt.new_email=ot.email) 
WHERE ot.exist_condition='Y'  
AND ot.ACCOUNT_TYPE !='inactive' 
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19

CodePudding user response:

I think your biggest issue here is the union all and these subselects. In general I would try to lower burden by making smaller temporary tables. If you have one huge table make it compact step by step. ¨

Try how long will take to execute only for example this part of the query

SELECT CTE2.EMAIL, CTE2.CONTACT_ID   FROM
         Contact_details_table   JOIN huge.table HT ON
         (CTE2.ORDER_ID=HT.ORDER_ID AND CTE2.account_id=HT.account_id)   Where
         CTE2.EMAIL IN (SELECT NEW_EMAIL as EMAIL FROM maintable
                         UNION ALL 
                         SELECT EMAIL as EMAIL FROM maintable)
             AND HT.ds>= dateadd(year, -2, current_date)

If this takes forever make it into one or two separate temporary tables before working with it.

CodePudding user response:

If the table is huge and you are attempting to surround it with a CTE then you will lose all indexes and statistics and it will become a heap. How big is your table? If it is huge, then you could gain traction by declaring a #table and inserting the data and then add appropriate indexes. Then do your query, it is worth a try out.

  •  Tags:  
  • sql
  • Related