Home > database >  Delete duplicate records when joining 2 tables
Delete duplicate records when joining 2 tables

Time:06-15

I have to join 2 tables and check if there are any duplicate records. If there are any I need to delete the duplicate records from dbo table

stage table

market cost ra file_id 
CA     32   2   200
CA     44   5   200
TX     22   2   200

dbo table

market cost ra  file_id
CA     72   9    100
CA     44   5    100
TX     22   2    100

When I join stage and dbo table, I want to delete the duplicate records

Output I want to see is:

dbo table

market cost ra  file_id
CA     72   9    100

I tried query:

    select s.market, s.cost,s.ra , s.[file_id] ,count(*)
    from stage table s
    join dbo table d on s.market=d.market and  s.cost=d.cost, s.ra=d.ra
    group by  s.market, s.cost,s.ra , s.[file_id]    
    having count(*) > 1;

Once I delete the duplicate records I will ingest the staged data to dbo.

Can anyone help me with delete statement.

CodePudding user response:

One solution is to use EXISTS

DELETE 
FROM dbo_table
WHERE  EXISTS (
    select 1
    from stage_table s
    WHERE s.market = dbo_table.market AND  dbo_table.cost = s.cost AND  s.ra=dbo_table.ra)
    
    ;
GO
rows affected
SELECT * FROM dbo_table
GO
market | cost | ra | file_id
:----- | ---: | -: | ------:
CA     |   72 |  9 |     100

db<>fiddle here

CodePudding user response:

I try to guess the requirement a bit since in my eyes you are not being very clear. You try to describe your way of implementing a solution.
It would be better to state first and in short what you want to achieve.

You also do not define clearly what you mean by duplicate record. I guess when the columns market, cost and ra have equal values (no matter the other column(s), then you consider it a duplicate.

My guess is:

  • You want to insert data from staging into dbo
  • The problem is some rows are duplicate, which you would like to overwrite during the process or delete before inserting them

You can delete them before you do the insertion.

delete from dbo d where exists (select 'X' from staging s where s.market = d.market and s.cost = d.cost and s.ra = d.ra)

Then you do the insertion.

insert into dbo select market, cost, ra, file_id from staging

I trust you can understand the SQL statements, so I won't explain them here. Feel free to comment otherwise.

  • Related