Home > other >  SQL Delete duplicates if the date matches
SQL Delete duplicates if the date matches

Time:12-13

I have a simple table:

Create Table data
( 
    PersonID int,
    LastName varchar(255),
    FirstName varchar(255),
    Price Int,             
    Date date,
    Tmstmp date
)

Populating Table with values:

INSERT INTO data (PersonID, LastName, FirstName, Price, date, Tmstmp)
VALUES (1, 'John', 'Snow', 100, '2020-04-08', '2020-01-01 10:34:09.000');
INSERT INTO data (PersonID, LastName, FirstName, Price, date, Tmstmp)
VALUES (2, 'Tony', 'Stark', 100, '2020-05-10', '2020-01-01 11:34:09.000');
INSERT INTO data (PersonID, LastName, FirstName, Price, date, Tmstmp)
VALUES (3, 'Tony', 'Stark', 120, '2021-12-11', '2020-01-01 12:34:09.000');
INSERT INTO data (PersonID, LastName, FirstName, Price, date, Tmstmp)
VALUES (4, 'Tony', 'Stark', 125, '2021-12-11', '2020-01-01 13:34:09.000');

enter image description here

The goal is to preserve only the values with the latest date. But only if the date is duplicated. So that Tony Stark will have 2 records with 2 dates. The duplicated row with the same date 2021-12-11 and the oldest timestamp will be deleted.

So The records will look like this:

enter image description here

I tried self join on tables but couldn't make it work. Tried also using ROW_NUMBER but im confused how the Partition By works. Any help to achieve my goal will be very much appreciated.

What I tried to do but havent been successful

DELETE FROM data d1
INNER JOIN data d2
ON d1.personID = d2.personID
AND d1.Date = d2.Date
WHERE d1.Tmstmp < d2.Tmstmp

CodePudding user response:

In most modern RDBMS you can use an updatable CTE in conjunction with row number

with d as (
    select *,
      Row_Number() over(partition by firstname, lastname, date order by Tmstmp desc) rn
    from data
)
delete from d
where rn>1
  • Related