Home > database >  Remove partially duplicated rows in SQL Server Express
Remove partially duplicated rows in SQL Server Express

Time:10-19

Got a table with several columns, with date at the end. It looks like:

Col1   Col2   Col3   date
----- ------ ------ ------------
  x     y      z     2022-10-01
  x     y      z     2022-10-10
  a     b      c     2022-10-01
  a     b      b     2022-10-10
  w     u      c     2022-10-15

What I'm trying to do is remove duplicates based on first three columns. With latest date left in column 4.

Tried to list it with:

Select col1, col2, col3, count(*) as counter 
from database
group by col1, col2, col3, date 
having count (*) > 1;

It's not working because it counts each row, including different dates.. Haven't found any other clues

Expected output is:

Col1 Col2 Col3 Date
x y z 2022-10-10
a b c 2022-10-10
w u c 2022-10-15

CodePudding user response:

You can use a common table expression and ROW_NUMBER to achieve this:

WITH cte 
AS
(
SELECT col1, col2, col3, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3 ORDER BY date) as rn
from database
)
DELETE cte
WHERE rn>1;

CodePudding user response:

Just aggregate by the first 3 columns and take the max of the fourth:

SELECT col1, col2, col3, MAX(date) AS date
FROM yourTable
GROUP BY col1, col2, col3;
  • Related