I have a simple table:
Create Table data
(
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Price Int,
Date date,
Tmstmp date
)
Populating Table with values:
INSERT INTO data (PersonID, LastName, FirstName, Price, date, Tmstmp)
VALUES (1, 'John', 'Snow', 100, '2020-04-08', '2020-01-01 10:34:09.000');
INSERT INTO data (PersonID, LastName, FirstName, Price, date, Tmstmp)
VALUES (2, 'Tony', 'Stark', 100, '2020-05-10', '2020-01-01 11:34:09.000');
INSERT INTO data (PersonID, LastName, FirstName, Price, date, Tmstmp)
VALUES (3, 'Tony', 'Stark', 120, '2021-12-11', '2020-01-01 12:34:09.000');
INSERT INTO data (PersonID, LastName, FirstName, Price, date, Tmstmp)
VALUES (4, 'Tony', 'Stark', 125, '2021-12-11', '2020-01-01 13:34:09.000');
The goal is to preserve only the values with the latest date. But only if the date is duplicated.
So that Tony Stark will have 2 records with 2 dates. The duplicated row with the same date 2021-12-11
and the oldest timestamp will be deleted.
So The records will look like this:
I tried self join on tables but couldn't make it work. Tried also using ROW_NUMBER but im confused how the Partition By works. Any help to achieve my goal will be very much appreciated.
What I tried to do but havent been successful
DELETE FROM data d1
INNER JOIN data d2
ON d1.personID = d2.personID
AND d1.Date = d2.Date
WHERE d1.Tmstmp < d2.Tmstmp
CodePudding user response:
In most modern RDBMS you can use an updatable CTE in conjunction with row number
with d as (
select *,
Row_Number() over(partition by firstname, lastname, date order by Tmstmp desc) rn
from data
)
delete from d
where rn>1