SQL Server - How to remove duplicates within select query?-CodePudding

How do you delete duplicate records based on multiple paramaters from a select query?

I have a complex query that returns these, and I need to remove duplicates based on IdRow, IdAudience and IdAb, so in the case below I would need to remove Ids 1 and 3:

Id	IdRow	IdAudience	IdAb	Quantity
1	NULL	NULL	2836	1
2	NULL	NULL	2836	1
3	560540	NULL	NULL	77
4	560540	NULL	NULL	77

But how do I do this from a select query?

CodePudding user response：

You could use a simple aggregation approach:

SELECT MAX(Id) AS Id, IdRow, IdAudience, IdAb, Quantity
FROM yourTable
GROUP BY IdRow, IdAudience, IdAb, Quantity
ORDER BY 1;

This assumes that, given a set of 2 or more duplicates, you want to retain the single record with the max Id value.

CodePudding user response：

You could certainly use the ever popular ROW_NUMBER() function to filter out the duplicates as well:

;WITH CTE AS
(
    SELECT * , ROW_NUMBER() OVER(PARTITION BY IdRow, IdAudience,IdAb ORDER BY Id DESC) RN
    FROM yourTable
)

DELETE FROM CTE 
WHERE RN > 1

CodePudding user response：

You can write this in a subquery as below

delete from yourtable where Id not in 
(
SELECT top 1000 MAX(Id) AS Id
FROM yourtable
GROUP BY IdRow, IdAudience, IdAb, Quantity
ORDER BY Id
)