Home > Software engineering >  Detected overlapping in SQL, without using a subquery in the select and without adding duplicate rec
Detected overlapping in SQL, without using a subquery in the select and without adding duplicate rec

Time:09-24

I'm looking to create a query that can detected overlapping in SQL, without using a subquery in the select and without adding duplicate records to the result set.

I know it can be done like this, but considering the amount of records in both tables I'm looking for faster solution.

SELECT x.T1Id
    ,CASE 
        WHEN MaxOtherColumn IS NULL
            THEN 0
        ELSE 1
        END HasOverlapInT2
    ,CASE 
        WHEN MaxOtherColumn IS NULL
            THEN OtherColumn
        ELSE MaxOtherColumn
        END MaxOtherColumn
FROM (
    SELECT CASE 
            WHEN EXISTS (
                    SELECT MAX(T2.OtherColumn)
                    FROM T2
                    WHERE T1.ValidFrom <= T2.ValidUntil
                        AND T2.ValidFrom <= T1.ValidUntil
                        AND t1.T1Id = T2.T1Id
                    )
                THEN 1
            ELSE NULL
            END MaxOtherColumn
        ,T1Id
        ,OtherColumn
    FROM T1
    ) x

I don't want to perform the exists for each row, but process it in batch.

Sample data

Table 1

T1Id (PK) ValidFrom ValidUntil OtherColumn
1 2021-01-01 2021-12-31 1
2 2021-09-23 2021-09-24 2

Table 2

T2Id (PK) T1Id(FK) ValidFrom ValidUntil OtherColumn
1 1 2021-01-01 2021-05-31 9000
2 1 2021-06-01 2021-12-31 9001
3 2 2021-01-01 2021-01-05 7000

Expected result

T1Id HasOverlapInT2 MaxOtherColumn
1 1 9001
2 0 2

SQL server compatibility level = 130 (2016)

CodePudding user response:

You could use CROSS APPLY to achieve your expected result (your current solution doesn't return your expected results):

SELECT x.T1Id,
    CASE 
        WHEN MaxOtherColumn IS NULL
        THEN 0
        ELSE 1
    END HasOverlapInT2,
    CASE 
        WHEN MaxOtherColumn IS NULL
        THEN OtherColumn
        ELSE MaxOtherColumn
    END MaxOtherColumn
FROM
(
    SELECT t.MaxOtherColumn,
        T1Id,
        OtherColumn
    FROM T1
    CROSS APPLY
    (
        SELECT MAX(T2.OtherColumn) MaxOtherColumn
        FROM T2
        WHERE T1.ValidFrom <= T2.ValidUntil
        AND T2.ValidFrom <= T1.ValidUntil
        AND t1.T1Id = T2.T1Id
    ) t
) x

CodePudding user response:

A slightly different way to think about it, in case APPLY is too mind-bending:

;WITH src AS 
(
  SELECT T1.T1Id, 
    T2Other = T2.OtherColumn, 
    T1Other = T1.OtherColumn,
    Overlap = CASE WHEN T2.ValidUntil >= T1.ValidFrom
              AND T2.ValidFrom <= T1.ValidUntil THEN 1 
              ELSE 0 END
  FROM dbo.T1
  INNER JOIN dbo.T2
    ON T1.T1Id = T2.T1Id
)
SELECT T1Id, 
  HasOverlapInT2 = MAX(Overlap), 
  MaxOtherColumn = MAX(CASE Overlap WHEN 1 THEN T2Other ELSE T1Other END)
FROM src
GROUP BY T1Id;

The key is to think about a way to generate the results with a single pass of both. While a correlated subquery doesn't always mean worse performance, it can lead to a scan of one table for every row in the other.

  • Related