Home > Net >  TSQL Combine multiple sub queries on same table
TSQL Combine multiple sub queries on same table

Time:11-25

I've been trying to improve a SQL query which uses multiple sub queries over the same table but with different conditions and only retrieves the first result from each sub queries.

I will try to simplify the use-case :

I have a table Products like this:

Product_id reference field3 field 4
1 ref1 val1 val3
2 ref2 val2 val4

And another table History:

History_id reference utilcode physicalcode issue media datetime
1 ref1 'test' 'TST' '0' '&audio' 'a_date'
2 ref2 'phone' 'CALLER' '1' '&video' 'a_date'
3 ref2 'test' 'CALLER' '2' '&test' 'a_date'

History is a log table and therefore contains a lot of values.

Now I have a query like this

SELECT 
    p.reference,
    p.field3, p.field4,
    (SELECT TOP 1 a_date 
     FROM history h 
     WHERE h.reference = p.reference 
       AND physicalcode = 'TST' 
       AND issue = 0 
     ORDER BY a_date DESC) AS latest_date_issue_0,
    (SELECT TOP 1 a_date 
     FROM history h 
     WHERE h.reference = p.reference 
       AND physicalcode = 'TST' 
       AND issue = 1 
     ORDER BY a_date DESC) AS latest_date_issue_1
    (SELECT TOP 1 a_date 
     FROM history h 
     WHERE h.reference = p.reference 
       AND utilcode = 'phone' 
     ORDER BY a_date DESC) AS latest_date_phone,
    (SELECT TOP 1 media 
     FROM history h 
     WHERE h.reference = p.reference 
       AND utilcode = 'phone' 
     ORDER BY a_date DESC) AS latest_media,
    -- and so on with many possible combinations
    -- Note that there are more than this few fields on the tables I work on.
WHERE
    p.field3 = 'valX',
    p.field4 = 'valY'
FROM
    products p

How could I merge every sub selects ? Or even a few that are alike to improve the performance ?

History being a very big table, selecting over it multiple times drastically slows down the query.

The main problem being that I only need the first value every time.

Thank you for your time and I hope to find a better way to deal with this issue!

I tried to use ROW_NUMBER() but I could not find a suitable way to use it.
I also tried to create a tmp table using WITH to group every possibility from history but it was worse.

EDIT : Execution plan https://www.brentozar.com/pastetheplan/?id=Sy1AKIsUs

CodePudding user response:

You can convert your correlated subqueries (you call them "subselects") to independent subqueries, then JOIN them. That way each subquery will only need to run once. I'll show you how to do this for your first subquery.

Here's a subquery replacing your first subquery.

SELECT reference, MAX(a_date) a_date
FROM history
WHERE  physicalcode = 'TST' 
AND issue = 0
GROUP BY reference

This gives a virtual table containing the latest date for each reference number from the history table matching the criteria in your question. A multicolumn index on history (physicalcode, issue, reference, a_date) makes this fast.

Then you can join it to the main table something like this:

SELECT 
    p.reference,
    p.field3, p.field4,
    a.a_date a_date_issue_0
FROM products p
LEFT JOIN ( /*the subquery */
           SELECT reference, MAX(a_date) a_date
           FROM history
           WHERE  physicalcode = 'TST' 
           AND issue = 0
           GROUP BY reference
   ) a ON p.reference=a.reference

These subqueries can also be defined as VIEWs or Common Table Expressions (CTEs). If you have many of them you'll probably find it easier to read and reason about your query by doing them that way.

Your last subquery is a little trickier to handle this way. I suggest you work with this answer and then maybe ask another question about that.

CodePudding user response:

Thanks to @O.Jones I've been able to find a way to improve this query.

What I did to merge a few requests was to use a CTE like so :

From

SELECT
(SELECT TOP 1 a_date 
     FROM history h 
     WHERE h.reference = p.reference 
       AND physicalcode = 'TST' 
       AND issue = 0 
     ORDER BY a_date DESC) AS latest_date_issue_0,
    (SELECT TOP 1 a_date 
     FROM history h 
     WHERE h.reference = p.reference 
       AND physicalcode = 'TST' 
       AND issue = 1 
     ORDER BY a_date DESC) AS latest_date_issue_1
     (SELECT top 1 a_date 
      FROM history h 
      WHERE h.reference = p.reference 
        AND h.physicalcode = 'TSTKO' 
      ORDER BY h.d_systeme DESC ) AS d_tst_ko,
     (SELECT top 1 a_date 
       FROM history h 
       WHERE h.reference = p.reference 
         AND h.physicalcode = 'CALLERID' 
       ORDER BY h.d_systeme DESC ) AS d_wrong_number
FROM products p

To

WITH physicalcode_cte (reference, physicalcode, issue, a_date)  as
    (
        SELECT reference, physicalcode, issue, max(a_date)
        from historique
        where codephysique in ('TST','TSTKO','CALLERID')
        and a_date > dateadd(month, -4, getdate()) -- filter on date range to reduce number of rows
        group by reference, physicalcode, issue
    )
SELECT
date_issue_0.a_date,
date_issue_1.a_date,
tst_ko.a_date,
wrong_number.a_date
FROM products p
LEFT JOIN physicalcode_cte date_issue_0 on p.reference = date_issue_0.reference 
        AND date_issue_0.codephysique = 'TST' 
        AND date_issue_0.anomalie = 0
LEFT JOIN physicalcode_cte date_issue_1 on p.reference = date_issue_1.reference
        AND date_issue_1.codephysique = 'TST' 
        AND date_issue_1.anomalie = 1
LEFT JOIN physicalcode_cte tst_ko on p.reference = tst_ko.reference
        AND tst_ko.codephysique = 'TST' 
LEFT JOIN physicalcode_cte wrong_number on p.reference = wrong_number.reference AND 
        AND wrong_number.codephysique = 'TST' 

I've applied this idea for different scenarii and made 2 CTE. I couldn't merge everything, sometime merging caused cost increase. But after several tests I've been able to go from 7100 total cost to 2100.
It is still a lot but 3 times less anyway. Takes 5 seconds instead of a timeout.
It's a query used for monthly reports so I don't need it to be super fast, I will keep it that way.

Thanks you!

  • Related