Home > Blockchain >  SQL Server - delete duplicate rows of a table that has many-to-many relationship
SQL Server - delete duplicate rows of a table that has many-to-many relationship

Time:11-20

I have in SQL Server a table that has many-to-many relatioships with another tables. So, my situation is: Table A has columns (for example [ID_A], name, surname, ...) Table B has columns (for example [ID_B], field1, field2, field3, ...) Table AB is the relation table between Table A and Table B which has only 2 columns (ID_A, ID_B). The delete rule and update rule are ON CASCADE (the properties update/delete between Table A and Table AB is ON CASCADE; the properties update/delete between Table AB and Table B is ON CASCADE).

But in Table A I have some duplicate rows:

ID_A   Name   Surname
--------------------
1      Alice  Brown
2      Bob    White
3      Alice  Brown
4      Alice  Brown
5      Alice  Brown
6      Bob    White

and every row has in relationship with Table B row. I want this situation in Table A:

 ID_A   Name   Surname
 ---------------------
 1     Alice  Brown
 2     Bob    White

so I want to remove the duplicate row and its relationship with Table B row. Table B has not numeric IDs.

Don't worry about the data, I just want one row, for example one row for Alice Brown, one row for Bob White, and so on. These tables are examples! (I know that two people can have same name and surname :) )

I can find duplicate rows using:

SELECT
    field
FROM
    Table
GROUP BY
    field
HAVING 
    COUNT(*) > 1

But I don't know how to join the DELETE statement with DUPLICATE SEARCH statement. I've just tried to manually delete records in Table A, but the cascade did not work.

Can someone help me? Thank you

CodePudding user response:

Maybe delete the duplicate rows first, like this:

DELETE 
A 
FROM TABLEA A
     INNER JOIN
    (
        SELECT *, 
               RANK() OVER(PARTITION BY name, surname                               
               ORDER BY ID_A) rank
        FROM TABLEA 
    ) T ON A.ID_A = t.ID_A
    WHERE rank > 1;

And then delete rows in your matrix table that no longer exist in Table A.

DELETE FROM TABLEB WHERE ID_A NOT IN(SELECT ID_A FROM TABLEA)

(Note the delete statement may be off syntax-wise as I am typing from phone!)

CodePudding user response:

You can try a subquery:

delete from Table where ID_A in (
  -- use min or max 
  select max(ID_A) from Table
  group by Name, Surname
  having count(*) > 1
)
  • Related