I got two tables. I wanted to get rid of the duplicates only if NAME, ID AND LOC matches. If Name,ID and Loc match exists in both the tables then i wanted to pick the record from the latest month (tb1_march in this case). My input table also has dupes!
TB1_FEB
|NAME | ID | DETAIL | LOC | AGE |
|MARK | 1 | EMAIL | USA | 34 |
|JEN | 2 | PHONE | USA | 32 |
|ROGER| 6 | EMAIL | UK | 34 |
TB2_MAR
|NAME | ID | DETAIL | LOC | AGE |
|MARK | 1 | PHONE | USA | 35 |
|JEN | 2 | PHONE | USA | 32 |
|STEVE| 8 | EMAIL | UK | 29 |
DESIRED OUTPUT:
|NAME | ID | DETAIL | LOC | AGE |
|MARK | 1 | PHONE | USA | 35 |
|JEN | 2 | PHONE | USA | 32 |
|ROGER| 6 | EMAIL | UK | 34 |
|STEVE| 8 | EMAIL | UK | 29 |
What i tried:
WITH SAMPLE_1 AS (
SELECT T2.* FROM TB2_MAR T2 LEFT JOIN TB1_FEB T1 ON
T1.NAME = T2.NAME
AND
T1.ID = T2.ID
AND
T1.LOC = T2.LOC
WHERE T1.ID IS NULL
),
SAMPLE_2 AS (
SELECT * FROM SAMPLE_1
UNION
SELECT * FROM TB1_FEB
)
SELECT * FROM SAMPLE_2
|NAME | ID | DETAIL | LOC | AGE |
|MARK | 1 | EMAIL | USA | 34 |
|MARK | 1 | PHONE | USA | 35 |
|JEN | 2 | PHONE | USA | 32 |
|ROGER| 6 | EMAIL | UK | 34 |
|STEVE| 8 | EMAIL | UK | 29 |
CodePudding user response:
The tables do not seem to contain any column that would help in final sorting so you need to add it; and use window functions to find latest row:
WITH cte1 AS (
SELECT *, 1 AS prioritee
FROM tb2_mar
UNION ALL
SELECT *, 2 AS prioritee
FROM tb1_feb
), cte2 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY name, id, loc ORDER BY prioritee) AS rn
FROM cte1
)
SELECT *
FROM cte2
WHERE rn = 1
CodePudding user response:
As Salman already pointed out, you are missing the month information in your table, so I will add that. Also, the duplicates are a problem, which will require a distinct. Here's my advice:
with u as (
select
2 as [month]
, name
, id
, detail
, loc
, age
from tb_feb
union all
select
3 as [month]
, name
, id
, detail
, loc
, age
from tb_mar
)
select distinct
name
, id
, loc
, last_value(age) over (partition by name,id,loc order by [month] rows between current row and unbounded following) as age
, last_value(detail) over (partition by name,id,loc order by [month] rows between current row and unbounded following) as detail
from u
Of course you need to adapt the select in case you have more columns.