Home > Mobile >  DataBricks and MERGE INTO: How to use two columns as the merge key?
DataBricks and MERGE INTO: How to use two columns as the merge key?

Time:05-29

In this example: https://docs.databricks.com/_static/notebooks/merge-in-scd-type-2.html, a single natural key is used to perform the MERGE INTO logic, like so:

 
MERGE INTO customers
USING (
   -- These rows will either UPDATE the current addresses of existing customers or INSERT the new addresses of new customers
  SELECT updates.customerId as mergeKey, updates.*
  FROM updates
  
  UNION ALL
  
  -- These rows will INSERT new addresses of existing customers 
  -- Setting the mergeKey to NULL forces these rows to NOT MATCH and be INSERTed.
  SELECT NULL as mergeKey, updates.*
  FROM updates JOIN customers
  ON updates.customerid = customers.customerid 
  WHERE customers.current = true AND updates.address <> customers.address 
  
) staged_updates
ON customers.customerId = mergeKey
WHEN MATCHED AND customers.current = true AND customers.address <> staged_updates.address THEN  
  UPDATE SET current = false, endDate = staged_updates.effectiveDate    -- Set current to false and endDate to source's effective date.
WHEN NOT MATCHED THEN 
  INSERT(customerid, address, current, effectivedate, enddate) 
  VALUES(staged_updates.customerId, staged_updates.address, true, staged_updates.effectiveDate, null) -- Set current to true along with the new address and its effective date.
 

How could I use a second column, in addition to customerId, to act as a mergeKey in this instance?

CodePudding user response:

Just combine them using AND:

ON customers.customerId = staged_updates.customerId 
  AND customers.<second_column> = staged_updates.<second_column>

it's the same as when you do JOIN between two tables - you need to provide join condition

  • Related