The Problem: How can I combine two SQL conditions for matching/joining into one statement? I want both missing records, but also if the record I have is outdated, I want to grab the latest and greatest. I have done with two separate SQL queries and then used a UNION
on the DISTINCT
Id
's but I would prefer to combine this into one query if possible. I'm struggling with the approach so any help is appreciated. Thank you!
Source Table
id | modified_at |
---|---|
a | 2023-10-02 16:42 |
b | 2023-10-03 16:42 |
c | 2023-10-04 16:42 |
d | 2023-10-05 16:42 |
e | 2023-10-06 16:42 |
f | 2023-10-07 16:42 |
g | 2023-10-08 16:42 |
h | 2023-10-09 16:42 |
i | 2023-10-10 16:32 |
j | 2023-10-11 16:52 |
Destination Table
id | modified_at |
---|---|
a | 2023-10-02 16:42 |
b | 2023-10-03 16:42 |
c | 2023-10-04 16:42 |
d | 2023-10-05 16:42 |
h | 2023-10-09 16:42 |
i | 2023-10-10 16:32 |
j | 2023-09-11 16:52 |
Desired Result from SELECT
id | modified_at |
---|---|
e | 2023-10-06 16:42 |
f | 2023-10-07 16:42 |
g | 2023-10-08 16:42 |
j | 2023-10-11 16:52 |
Rationale: e
, f
, and g
are missing records in destination
, so we want to include them in our result set. Last but not least j
is included because source
has a more recent record than destination
due to the modified_at
timestamp, so we want to include it to get the latest and greatest. My current attempts are below:
Technical Limitation: Not able to use transactions, if statements, loops, inserts, or updates. I'm strictly limited to select statements, joins, crosses, etc..
SQL: All records that changed
SELECT
id,
modified_at
FROM source
WHERE source.modified_at >= (SELECT MAX(modified_at) AS last_modified_at FROM destination)
SQL: All missing records
SELECT
source.id,
source.modified_at
FROM source
LEFT JOIN destination
ON source.Id = destination.id
WHERE destination.id IS NULL
CodePudding user response:
When asking questions like this it's helpful to provide the DDL/DML:
DECLARE @SourceTable TABLE (ID VARCHAR(1), ModifiedDateTime DATETIME);
INSERT INTO @SourceTable (ID, ModifiedDateTime) VALUES
('a', '2023-10-02 16:42'), ('b', '2023-10-03 16:42'), ('c', '2023-10-04 16:42'), ('d', '2023-10-05 16:42'),
('e', '2023-10-06 16:42'), ('f', '2023-10-07 16:42'), ('g', '2023-10-08 16:42'), ('h', '2023-10-09 16:42'),
('i', '2023-10-10 16:32'), ('j', '2023-10-11 16:52');
DECLARE @DestinationTable TABLE (ID VARCHAR(1), ModifiedDateTime DATETIME);
INSERT INTO @DestinationTable (ID, ModifiedDateTime) VALUES
('a', '2023-10-02 16:42'), ('b', '2023-10-03 16:42'), ('c', '2023-10-04 16:42'), ('d', '2023-10-05 16:42'),
('h', '2023-10-09 16:42'),
('i', '2023-10-10 16:32'), ('j', '2023-09-11 16:52');
As @Dai suggests, you can either use an OR
or a UNION ALL
here:
/* OR */
SELECT *
FROM @SourceTable s
LEFT OUTER JOIN @DestinationTable d
ON s.ID = d.ID
WHERE d.ID IS NULL
OR s.ModifiedDateTime > d.ModifiedDateTime;
/* UNION ALL */
SELECT *
FROM @SourceTable s
LEFT OUTER JOIN @DestinationTable d
ON s.ID = d.ID
WHERE d.ID IS NULL
UNION ALL
SELECT *
FROM @SourceTable s
LEFT OUTER JOIN @DestinationTable d
ON s.ID = d.ID
WHERE s.ModifiedDateTime > d.ModifiedDateTime;
ID | ModifiedDateTime | ID | ModifiedDateTime |
---|---|---|---|
e | 2023-10-06 16:42:00.000 | ||
f | 2023-10-07 16:42:00.000 | ||
g | 2023-10-08 16:42:00.000 | ||
j | 2023-10-11 16:52:00.000 | j | 2023-09-11 16:52:00.000 |
CodePudding user response:
Isn't this a simple OR:
SELECT
source.id,
source.modified_at
FROM source
LEFT JOIN destination
ON source.Id = destination.id
WHERE (
destination.id IS NULL
OR source.modified_at >= destination.modified_at
)
Your code does some sort of MAX so they're not completely equavalent, but i think the above is the canonical version of missing / new changes