I've been tasked with identifying and removing the duplicate entries from a really messy asset list. It's a mess because, over the years, different users / developers have tried to use alternate unique identification setups that, because of the absence of any primary keys, kind of worked if you weren't paying attention to the data.
I can find unique rows (in a broad sense) by using:
SELECT DISTINCT
AssetId, AssetNo, PPH, Testdate
FROM dbo.AssetList;
And now I'm trying to expand the query to allow me to select every column in the dataset. I thought this would work:
SELECT *
FROM dbo.AssetList
WHERE AssetId IN (
SELECT DISTINCT
AssetId, AssetNo, PPH, Testdate
FROM dbo.AssetList
);
But I was wrong. Can anyone explain to me how I can achieve this in SQL Server?
CodePudding user response:
SELECT A.* FROM dbo.AssetList AS A
JOIN
(
SELECT DISTINCT
AssetId, AssetNo, PPH, Testdate
FROM dbo.AssetList
)Q ON A.AssetId =Q.AssetId
Not sure what you want to achieve,but please try the above
CodePudding user response:
Use Row_Number()
SELECT *
FROM (
SELECT *,
RN=ROW_NUMBER()
OVER (PARTITION BY AssetId, AssetNo, PPH, Testdate, add_more_columns
ORDER BY TestDate --or some other suitable column)
FROM dbo.AssetList
) X
WHERE X.RN > 1 ;