Home > other >  SQL Select missing and updated rows only
SQL Select missing and updated rows only

Time:11-05

The Problem: How can I combine two SQL conditions for matching/joining into one statement? I want both missing records, but also if the record I have is outdated, I want to grab the latest and greatest. I have done with two separate SQL queries and then used a UNION on the DISTINCT Id's but I would prefer to combine this into one query if possible. I'm struggling with the approach so any help is appreciated. Thank you!

Source Table

id modified_at
a 2023-10-02 16:42
b 2023-10-03 16:42
c 2023-10-04 16:42
d 2023-10-05 16:42
e 2023-10-06 16:42
f 2023-10-07 16:42
g 2023-10-08 16:42
h 2023-10-09 16:42
i 2023-10-10 16:32
j 2023-10-11 16:52

Destination Table

id modified_at
a 2023-10-02 16:42
b 2023-10-03 16:42
c 2023-10-04 16:42
d 2023-10-05 16:42
h 2023-10-09 16:42
i 2023-10-10 16:32
j 2023-09-11 16:52

Desired Result from SELECT

id modified_at
e 2023-10-06 16:42
f 2023-10-07 16:42
g 2023-10-08 16:42
j 2023-10-11 16:52

Rationale: e, f, and g are missing records in destination, so we want to include them in our result set. Last but not least j is included because source has a more recent record than destination due to the modified_at timestamp, so we want to include it to get the latest and greatest. My current attempts are below:

Technical Limitation: Not able to use transactions, if statements, loops, inserts, or updates. I'm strictly limited to select statements, joins, crosses, etc..

SQL: All records that changed

SELECT
  id,
  modified_at
FROM source
WHERE source.modified_at >= (SELECT MAX(modified_at) AS last_modified_at FROM destination)

SQL: All missing records

SELECT
  source.id,
  source.modified_at
FROM source
LEFT JOIN destination
  ON source.Id = destination.id
WHERE destination.id IS NULL

CodePudding user response:

When asking questions like this it's helpful to provide the DDL/DML:

DECLARE @SourceTable TABLE (ID VARCHAR(1), ModifiedDateTime DATETIME);
INSERT INTO @SourceTable (ID, ModifiedDateTime) VALUES
('a', '2023-10-02 16:42'), ('b', '2023-10-03 16:42'), ('c', '2023-10-04 16:42'), ('d', '2023-10-05 16:42'), 
('e', '2023-10-06 16:42'), ('f', '2023-10-07 16:42'), ('g', '2023-10-08 16:42'), ('h', '2023-10-09 16:42'), 
('i', '2023-10-10 16:32'), ('j', '2023-10-11 16:52');

DECLARE @DestinationTable TABLE (ID VARCHAR(1), ModifiedDateTime DATETIME);
INSERT INTO @DestinationTable (ID, ModifiedDateTime) VALUES
('a', '2023-10-02 16:42'), ('b', '2023-10-03 16:42'), ('c', '2023-10-04 16:42'), ('d', '2023-10-05 16:42'), 
('h', '2023-10-09 16:42'), 
('i', '2023-10-10 16:32'), ('j', '2023-09-11 16:52');

As @Dai suggests, you can either use an OR or a UNION ALL here:

/* OR */
SELECT *
  FROM @SourceTable s
    LEFT OUTER JOIN @DestinationTable d
      ON s.ID = d.ID
 WHERE d.ID IS NULL
    OR s.ModifiedDateTime > d.ModifiedDateTime;

/* UNION ALL */
SELECT *
  FROM @SourceTable s
    LEFT OUTER JOIN @DestinationTable d
      ON s.ID = d.ID
 WHERE d.ID IS NULL
UNION ALL
SELECT *
  FROM @SourceTable s
    LEFT OUTER JOIN @DestinationTable d
      ON s.ID = d.ID
 WHERE s.ModifiedDateTime > d.ModifiedDateTime;
ID ModifiedDateTime ID ModifiedDateTime
e 2023-10-06 16:42:00.000
f 2023-10-07 16:42:00.000
g 2023-10-08 16:42:00.000
j 2023-10-11 16:52:00.000 j 2023-09-11 16:52:00.000

CodePudding user response:

Isn't this a simple OR:

SELECT
  source.id,
  source.modified_at
FROM source
LEFT JOIN destination
  ON source.Id = destination.id
WHERE (
    destination.id IS NULL
 OR source.modified_at >= destination.modified_at
)

Your code does some sort of MAX so they're not completely equavalent, but i think the above is the canonical version of missing / new changes

  • Related