I need to create a stored procedure in SQL Server and implement an upsert such that it will move data from a staging table (a.k.a Source) to final table (a.k.a target) and mark rows which are new, updated, not updated or deleted every time a new batch of data comes in. I am using merge as explained
Notice row number 4 and 5. There has been no change in the input row data for Anup still I get the [is_deleted] column as "Updated". I want it to be something like "Existing" or "No change".
Please help in making this possible. This upsert logic is a part of a big pipeline and we need the rows that are updated, new, not updated or deleted in the new file. How do I achieve this?
CodePudding user response:
You can add a case statement to the is_deleted field in the update to check whether anything has changed. Something like this:
MERGE [dbo].[employee] as Target
USING [dbo].[employee_stagging] as Source
ON Source.[first_name] = Target.[first_name] and
Source.[last_name] = Target.[last_name] and
Source.[dob] = Target.[dob]
WHEN MATCHED
THEN
UPDATE
SET Target.[salary] = Source.[salary],
Target.[current_address] = Source.[current_address],
Target.[is_deleted] = CASE WHEN Source.salary = Target.salary
AND Source.current_address = Target.current_address THEN 'No change'
ELSE 'Updated'
END,
Target.[processed_date] = @current_time
WHEN NOT MATCHED BY Target
THEN
INSERT ([first_name],
[last_name],
[dob],
[salary],
[current_address],
[is_deleted],
[processed_date])
VALUES (Source.[first_name],
Source.[last_name],
Source.[dob],
Source.[salary],
Source.[current_address],
'New',
@current_time
);
So if the update fields are identical to the staged data, it will update is_deleted to "No change", whereas if the update fields have changed it will update is_deleted to "Updated".
NOTE: This code is assuming salary and current_address are non-nullable fields (table definition for field is followed by NOT NULL). If there ARE nullable, then you should handle the null values by replacing the CASE statement with the following:
CASE WHEN (Source.salary = Target.salary
OR Source.salary IS NULL
AND Target.salary IS NULL)
AND (Source.current_address = Target.current_address
OR Source.current_address IS NULL
AND Target.current_address IS NULL) THEN 'No change'
ELSE 'Updated'
END
(Documentation here)