Merge into (in SQL), but ignore the duplicates-CodePudding

I try to merge two tables in snowflake with:

On CONCAT(tab1.column1, tab1.column2) = CONCAT(tab1.column1, tab1.column2)

The problem here is that there are duplicates. that means rows where column1 and column2 in table2 are identical. the only difference is the column timestamp. Therefore i would like to have two options: either i ignore the duplicate and take only one row (with the biggest timestamp), or distinguish again based on the timestamp. the second would be nicer

But I have no clue how to do it

Example:

Table1:

Col1 Col2 Col3 Timestamp
24    10    3  05.05.2022
34    19    2  04.05.2022
24    10    4  06.05.2022


Table2:

Col1 Col2 Col3 
24    10    Null  
34    19    Null

What I want to do:

MERGE INTO table1 AS dest USING
(SELECT * FROM table2) AS src
ON CONCAT(dest.col1, dest.col2) = CONCAT(src.col1, src.col2)
WHEN MATCHED THEN UPDATE
SET dest.col3 = src.col3

CodePudding user response：

It feels like you want to update from TABLE1 too TABLE2 not the other way around, because as your example is there is no duplicates.

It also feels like you want to use two equi join's on col1 AND col2 not concat them together:

thus how I see your data, and the words you used, I think you should do this:

create or replace table table1(Col1 number, Col2 number, Col3 number, timestamp date);

insert into table1 values
    (24, 10, 3, '2022-05-05'::date),
    (34, 19, 2, '2022-05-04'::date),
    (24, 10, 4, '2022-05-06'::date);

create or replace table table2(Col1 number, Col2 number, Col3 number);

insert into table2 values
    (24, 10 ,Null),  
    (34, 19 ,Null); 


MERGE INTO table2 AS d
USING (
    select * 
    from table1 
    qualify row_number() over (partition by col1, col2 order by timestamp desc) = 1
) AS s
    ON d.col1 = s.col1 AND  d.col2 = s.col2
WHEN MATCHED THEN UPDATE
SET d.col3 = s.col3;

which runs fine:

number of rows updated
2

select * from table2;

shows it's has been updated:

COL1	COL2	COL3
24	10	4
34	19	2

but the JOIN being your way work as you have used if that is correct for your application, albeit it feels very wrong to me.

MERGE INTO table2 AS d
USING (
    select * 
    from table1 
    qualify row_number() over (partition by col1, col2 order by timestamp desc) = 1
) AS s
    ON concat(d.col1, d.col2) = concat(s.col1, s.col2)
WHEN MATCHED THEN UPDATE
SET d.col3 = s.col3;

CodePudding user response：

This is it:

WITH CTE AS
(
    SELECT  *,
            RANK() OVER (PARTITION BY col1,col2
                         ORDER BY Timestamp desc) AS rn
    FROM table1
)
UPDATE CTE
SET col3 = (select col3 from table2 where CONCAT(table2.col1,table2.col2) = CONCAT(CTE.col1, CTE.col2))
where CTE.rn =1;