Home > front end >  Combine multiple updates with conditions, better merge?
Combine multiple updates with conditions, better merge?

Time:10-03

A follow up question to SQL Server Merge: update only changed data, tracking changes?

we have been struggling to get an effective merge statement working, and are now thinking about only using updates, we have a very simple problem: Update Target from Source where values are different and record the changes, both tables are the same layout.

So, the two questions we have are: is it possible to combine this very simple update into a single statement?

UPDATE        tbladsgroups
SET           tbladsgroups.Description = s.Description,
              tbladsgroups.action='Updated'
FROM          tbladsgroups t
INNER JOIN    tbladsgroups_staging s
ON            t.SID = s.SID
Where   s.Description   <> t.Description 

UPDATE        tbladsgroups
SET           tbladsgroups.DisplayName = s.DisplayName,
              tbladsgroups.action='Updated'
FROM          tbladsgroups t
INNER JOIN    tbladsgroups_staging s
ON            t.SID = s.SID
Where   s.DisplayName   <> t.DisplayName

....for each column.

Second question.

Can we record into a separate table/variable which record has been updated?

Merge would be perfect, however we cannot see which record is updated as the data returned from OUTPUT shows all rows, as the target is always updated.

edit complete merge:

M

ERGE tblADSGroups AS TARGET
    USING tblADSGroups_STAGING AS SOURCE
        ON (TARGET.[SID] = SOURCE.[SID])
    WHEN MATCHED 
    THEN UPDATE SET
        
        TARGET.[Description]=CASE
                            WHEN source.[Description] != target.[Description]  THEN(source.[Description] 
                            )
                            ELSE target.[Description] END,
        TARGET.[displayname] = CASE 
                            WHEN source.[displayname] != target.[displayname]  THEN source.[displayname]
                            ELSE target.[displayname] END
        
...other columns cut for brevity


    WHEN NOT MATCHED BY TARGET
        THEN
            INSERT (
                [SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
                )
            VALUES (
                source.[SID],[SamAccountName],[DisplayName],[Description],[DistinguishedName],[GroupCategory],[GroupScope],[Created],[Members],[MemberOf],[SYNCtimestamp],[Action]
                )
 
  
  WHEN NOT MATCHED BY SOURCE
        THEN
            UPDATE SET ACTION='Deleted'

CodePudding user response:

You can use a single UPDATE with an OUTPUT clause, and use an INTERSECT or EXCEPT subquery in the join clause to check whether any columns have changed.

For example

UPDATE t
SET Description = s.Description,
    DisplayName = s.DisplayName,
    action = 'Updated'
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO @tbl (ID, Description, DisplayName)
FROM tbladsgroups t
INNER JOIN tbladsgroups_staging s
  ON t.SID = s.SID
  AND NOT EXISTS (
    SELECT s.Description, s.DisplayName
    INTERSECT
    SELECT t.Description, t.DisplayName
  );

You can do a similar thing with MERGE, if you also want to INSERT

MERGE tbladsgroups t
USING tbladsgroups_staging s
  ON t.SID = s.SID
WHEN MATCHED AND NOT EXISTS (    -- do NOT place this condition in the ON
    SELECT s.Description, s.DisplayName
    INTERSECT
    SELECT t.Description, t.DisplayName
  )
  THEN UPDATE SET
    Description = s.Description,
    DisplayName = s.DisplayName,
    action = 'Updated'
WHEN NOT MATCHED
  THEN INSERT (ID, Description, DisplayName)
       VALUES (s.ID, s.Description, s.DisplayName)
OUTPUT inserted.ID, inserted.Description, inserted.DisplayName
INTO @tbl (ID, Description, DisplayName)
;

CodePudding user response:

We have similar needs when dealing with values in our Data Warehouse dimensions. Merge works fine, but can be inefficient for large tables. Your method would work, but also seems fairly inefficient in that you would have individual updates for every column. One way to shorten things would be to compare multiple columns in one statement (which obviously makes things more complex). You also do not seem to take NULL values into consideration.

What we ended up using is essentially the technique described on this page: https://sqlsunday.com/2016/07/14/comparing-nullable-columns/

Using INTERSECT allows you to easily (and quickly) compare differences between our staging and our dimension table, without having to explicitly write a comparison for each individual column.

To answer your second question, the technique above would not enable you to catch which column changed. However, you can compare the old row vs the new row (we "close" the earlier version of the row by setting a "ValidTo" date, and then add the new row with a "ValidFrom" date equal to today's date.

Our code ends up looking like the following:

  • INSERT all rows from the stage table that do not have a matching key value in the new table (new rows)
  • Compare stage vs dimension using the INTERSECT and store all matches in a table variable
  • Using the table variable, "close" all matching rows in the Dimension
  • Using the table variable, INSERT the new rows
  • If there's a full load taking place, we can also check for Keys that only exist in the dimension but not in the stage table. This would indicate those rows were deleted in the source system, and we mark them as "IsDeleted" in the dimension.

CodePudding user response:

I think you may be overthinking the complexity, but yes. Your underlying update is a compare between the ads group and staging tables based on the matching ID in each query. Since you are already checking the join on ID and comparing for different description OR display name, just update both fields. Why?

groups description   groups display   staging description  staging display
SomeValue            Show Me          SOME other Value     Show Me
Try This             Attempt          Try This             Working on it
Both Diff            Changes          Both Are Diff        Change Me

So the ultimate value you want is to pull both description and display FROM the staging back to the ads groups table.

In the above sample, I have three samples which if based on matching ID present entries that would need to be changed. If the value is the same in one column, but not the other and you update both columns, the net effect is the one bad column that get updated. The first would ultimately remain the same. If both are different, both get updated anyhow.

UPDATE        tbladsgroups
SET           tbladsgroups.Description = s.Description,
              tbladsgroups.DisplayName = s.DisplayName,
              tbladsgroups.action='Updated'
FROM          tbladsgroups t
INNER JOIN    tbladsgroups_staging s
ON            t.SID = s.SID
Where   s.Description   <> t.Description 
   OR   s.DisplayName   <> t.DisplayName

Now, all this resolution being said, you have redundant data and that is the whole point of a lookup table. The staging appears to always have the correct display name and description. Your tblAdsGroups should probably remove those two columns and always get them from staging to begin with... Something like..

select
      t.*,
      s.Description,
      s.DisplayName
   from
      tblAdsGroups t
         JOIN tblAdsGroups_Staging s
            on t.sid = s.sid 

Then you always have the correct description and display name and dont have to keep synching updates between them.

  • Related