Using DISTINCT in where clause subquery to find unique column combination-CodePudding

I've been tasked with identifying and removing the duplicate entries from a really messy asset list. It's a mess because, over the years, different users / developers have tried to use alternate unique identification setups that, because of the absence of any primary keys, kind of worked if you weren't paying attention to the data.

I can find unique rows (in a broad sense) by using:

SELECT DISTINCT
    AssetId, AssetNo, PPH, Testdate
FROM dbo.AssetList;

And now I'm trying to expand the query to allow me to select every column in the dataset. I thought this would work:

SELECT *
FROM dbo.AssetList
WHERE AssetId IN (
    SELECT DISTINCT
        AssetId, AssetNo, PPH, Testdate
    FROM dbo.AssetList
);

But I was wrong. Can anyone explain to me how I can achieve this in SQL Server?

CodePudding user response：

SELECT A.* FROM dbo.AssetList AS A
JOIN
(
   SELECT DISTINCT
   AssetId, AssetNo, PPH, Testdate
   FROM dbo.AssetList
)Q ON A.AssetId =Q.AssetId

Not sure what you want to achieve,but please try the above

CodePudding user response：

Use Row_Number()

SELECT *
FROM (
    SELECT *, 
        RN=ROW_NUMBER()
            OVER (PARTITION BY AssetId, AssetNo, PPH, Testdate, add_more_columns
              ORDER BY  TestDate --or some other suitable column)
    FROM dbo.AssetList
) X
WHERE X.RN > 1 ;