I'm looking to create a query that can detected overlapping in SQL, without using a subquery in the select and without adding duplicate records to the result set.
I know it can be done like this, but considering the amount of records in both tables I'm looking for faster solution.
SELECT x.T1Id
,CASE
WHEN MaxOtherColumn IS NULL
THEN 0
ELSE 1
END HasOverlapInT2
,CASE
WHEN MaxOtherColumn IS NULL
THEN OtherColumn
ELSE MaxOtherColumn
END MaxOtherColumn
FROM (
SELECT CASE
WHEN EXISTS (
SELECT MAX(T2.OtherColumn)
FROM T2
WHERE T1.ValidFrom <= T2.ValidUntil
AND T2.ValidFrom <= T1.ValidUntil
AND t1.T1Id = T2.T1Id
)
THEN 1
ELSE NULL
END MaxOtherColumn
,T1Id
,OtherColumn
FROM T1
) x
I don't want to perform the exists for each row, but process it in batch.
Sample data
Table 1
T1Id (PK) | ValidFrom | ValidUntil | OtherColumn |
---|---|---|---|
1 | 2021-01-01 | 2021-12-31 | 1 |
2 | 2021-09-23 | 2021-09-24 | 2 |
Table 2
T2Id (PK) | T1Id(FK) | ValidFrom | ValidUntil | OtherColumn |
---|---|---|---|---|
1 | 1 | 2021-01-01 | 2021-05-31 | 9000 |
2 | 1 | 2021-06-01 | 2021-12-31 | 9001 |
3 | 2 | 2021-01-01 | 2021-01-05 | 7000 |
Expected result
T1Id | HasOverlapInT2 | MaxOtherColumn |
---|---|---|
1 | 1 | 9001 |
2 | 0 | 2 |
SQL server compatibility level = 130 (2016)
CodePudding user response:
You could use CROSS APPLY
to achieve your expected result (your current solution doesn't return your expected results):
SELECT x.T1Id,
CASE
WHEN MaxOtherColumn IS NULL
THEN 0
ELSE 1
END HasOverlapInT2,
CASE
WHEN MaxOtherColumn IS NULL
THEN OtherColumn
ELSE MaxOtherColumn
END MaxOtherColumn
FROM
(
SELECT t.MaxOtherColumn,
T1Id,
OtherColumn
FROM T1
CROSS APPLY
(
SELECT MAX(T2.OtherColumn) MaxOtherColumn
FROM T2
WHERE T1.ValidFrom <= T2.ValidUntil
AND T2.ValidFrom <= T1.ValidUntil
AND t1.T1Id = T2.T1Id
) t
) x
CodePudding user response:
A slightly different way to think about it, in case APPLY
is too mind-bending:
;WITH src AS
(
SELECT T1.T1Id,
T2Other = T2.OtherColumn,
T1Other = T1.OtherColumn,
Overlap = CASE WHEN T2.ValidUntil >= T1.ValidFrom
AND T2.ValidFrom <= T1.ValidUntil THEN 1
ELSE 0 END
FROM dbo.T1
INNER JOIN dbo.T2
ON T1.T1Id = T2.T1Id
)
SELECT T1Id,
HasOverlapInT2 = MAX(Overlap),
MaxOtherColumn = MAX(CASE Overlap WHEN 1 THEN T2Other ELSE T1Other END)
FROM src
GROUP BY T1Id;
- Example db<>fiddle
The key is to think about a way to generate the results with a single pass of both. While a correlated subquery doesn't always mean worse performance, it can lead to a scan of one table for every row in the other.