Home > Net >  SQL - Remove duplicates after using a GROUP BY clause
SQL - Remove duplicates after using a GROUP BY clause

Time:04-27

Let's say I had two tables that looked like this:

Prod_SerialNo Prod_TestOnAt Prod_AccountNo
SN0001 2021-04-08 045678
SN0001 2021-01-14 067891
SN0001 2021-11-29 091234
SN0002 2022-01-19 045678
SN0002 2020-07-30 045678
SN0002 2022-03-30 012345
SN0003 2022-04-01 078912
SN0003 2022-02-19 089123
SN0003 2022-03-18 023456
S_AccountNo S_AccountType S_AccountName
012345 Homeowner Adam Smith
023456 Homeowner Jeremy Chan
034567 Manufacturer Anne Hudson
045678 Distributor Barney Jones
056789 Distributor Jasmine Coleman
067891 Homeowner Christian Lewis
078912 Distributor Heather Ogden
089123 Homeowner Stephen Gray
091234 Distributor Antony Newman

The Prod Table tabulates specific product tests by what serial number was used, when the product was tested, and who tested it. (There are other things in the table, including a primary key not shown here)

The S Table is a list of subscribers with a variety of information about them. S_AccountNo is the parent to Prod_AccountNo.

I want to query when the last test was performed for each Serial Number and what account name it was that performed the test, but I don't want multiple results (duplicates) for the same serial number. I have tried the following code:

SELECT 
   Prod_SerialNo, 
   MAX(Prod_TestOnAt) AS "Last Time Tested", 
   S_AccountName 
FROM Prod
INNER JOIN S ON S.S_AccountNo = Prod.Prod_AccountNo
GROUP BY Prod_SerialNo, S_AccountName
ORDER BY Prod_SerialNo

However, the query ends up outputting the same serial number on multiple rows even though I ask for the max TestOnAt date and I group by serial number. What am I getting wrong?

CodePudding user response:

I think there is no need to use Group by you can get result with Row_Number like this:


SELECT 
   t.Prod_SerialNo, 
   t.Prod_TestOnAt AS "Last Time Tested", 
   t.S_AccountName 
 FROM (
SELECT 
   Prod_SerialNo, 
   Prod_TestOnAt,
   S_AccountName,
    ROW_NUMBER() OVER (PARTITION BY Prod_SerialNo ORDER BY Prod_TestOnAt DESC) rw
FROM Prod
INNER JOIN S ON S.S_AccountNo = Prod.Prod_AccountNo
) t
WHERE t.rw=1
ORDER BY t.Prod_SerialNo

CodePudding user response:

You are grouping by Prod_SerialNo, S_AccountName so you will get duplicate Prod_SerialNo if multiple rows exist with that Prod_SerialNo and different S_AccountNames. You could do a MAX on Prod_TestOnAt and get that value with it's Prod_SerialNo, then join the result on the table to get your desired info using a subquery like so:

SELECT
  p.[Prod_SerialNo],
  max.[LastTimeTested],
  s.[S_AccountName]
FROM PROD as p
INNER JOIN
(
    SELECT 
       Prod_SerialNo, 
       MAX(Prod_TestOnAt) as [LastTimeTested]
    FROM Prod
    GROUP BY [Prod_SerialNo]
) as max
on max.[Prod_SerialNo] = p.[Prod_SerialNo] and max.[LastTimeTested] = p.[Prod_TestOnAt]
INNER JOIN S as s
ON s.[S_AccountNo] = p.[Prod_AccountNo]
ORDER BY p.[Prod_SerialNo]
  •  Tags:  
  • sql
  • Related