Home > front end >  How to delete rows that have duplicate column combination
How to delete rows that have duplicate column combination

Time:04-17

I need to delete some rows that have the same column combination (except ID). The row that should be kept is the one that has the maximum reference date

ID      Column1 Column2   RefDate
GUID1   34578   BRKNRI    2018-05-03
GUID2   34578   BRKNRI    2018-05-02
GUID3   12381   BRSAPR    2018-05-03
GUID4   12381   BRSAPR    2018-05-02
GUID5   12381   BRSAPR    2018-05-01

So, after the query the table should be like this

ID      Column1 Column2   RefDate
GUID1   34578   BRKNRI    2018-05-03
GUID3   12381   BRSAPR    2018-05-03

I know that the query below will return the table I want, but I don't know how to delete the "duplicate" entries and leave only the one with the maximum date.

SELECT Column1,
       Column2,
       max(RefDate) as MaxDate
FROM Table
GROUP BY Column1, Column2)

CodePudding user response:

You can try to use delete JOIN

DELETE t1
FROM [Table] t1
INNER JOIN (
  SELECT Column1,
       Column2,
       max(RefDate) as MaxDate
  FROM [Table]
  GROUP BY Column1, Column2
) t2
  ON t1.Column1 = t2.Column1
 AND t1.Column2 = t2.Column2
 AND t1.RefDate <> t2.MaxDate

or use EXISTS subquery.

DELETE t1
FROM [Table] t1
WHERE EXISTS (
  SELECT 1
  FROM [Table] t2
  WHERE t1.Column1 = t2.Column1
  AND t1.Column2 = t2.Column2
  HAVING max(t2.RefDate) <> t1.RefDate
)

sqlfiddle

CodePudding user response:

You should verify first that your select really finds exactly those rows you don't want to delete. If this is the case, you can delete all other rows using following command:

    DELETE FROM yourtable where id NOT IN
    (SELECT id FROM 
    (SELECT id,
    Column1,
    Column2,
    max(RefDate) as MaxDate
    FROM yourtable
    GROUP BY Column1, Column2))

CodePudding user response:

You can use ROW_NUMBER() to partition the data based on Column1 ,Column2 and after that filter the data where ROW_NUMBER() value is greater than 1 to get the duplicate records.

Delete those idS.

DELETE from ID where id IN
(
select  id FROM 
(
select id, Column1 ,Column2 ,RefDate,
ROW_NUMBER() OVER( PARTITION BY Column1 ,Column2 ORDER BY Column1 ,Column2) AS temp_var
from id
)  t
where t.temp_var >1 ) ;
  • Related