Home > Blockchain >  Find duplicate rows conditionally in SQL
Find duplicate rows conditionally in SQL

Time:08-12

Problem
I need to filter a set of records conditionally. The rules for filter the data are: find duplicate values by ProductNumber and also they need to have a least two ProductType (1 and 5)

Example
Product

ProductNumber ProductType ProductName
A-000217 1 stuff 13
A-000217 5 stuff 13
A-000086 5 stuff 2
A-000086 5 stuff 2
A-001229 1 stuff 20
A-001146 5 stuff 21
A-001146 1 stuff 21

This should be the output

Product (Filtered)

ProductNumber ProductType ProductName
A-000217 1 stuff 13
A-000217 5 stuff 13
A-001146 5 stuff 21
A-001146 1 stuff 21
  • Product A-000086 has been excluded because both records are type 5.
  • Product A-001229 only has 1 record so it must be excluded.

I tried this code, but it didn't work as expected.

CREATE TABLE #Product(
    [Id] [int] IDENTITY(1,1)  NOT NULL,
    [ProductNumber] [varchar](60) NOT NULL,
    [ProductType] [int] NOT NULL,
    [ProductName] [varchar](60) NOT NULL,
PRIMARY KEY(Id)
);

INSERT INTO [#Product]([ProductNumber], [ProductType], [ProductName])
 VALUES('A-000217', 1, 'stuff 13')
,('A-000217', 5, 'stuff 13')
,('A-000086', 5, 'stuff 2')
,('A-000086', 5, 'stuff 2')
,('A-001229', 1, 'stuff 20')
,('A-001146', 5, 'stuff 21')
,('A-001146', 1, 'stuff 21')

--Filter data
; WITH CTE AS (
SELECT 
      ProductNumber 
    , ProductType 
    , ProductName
    , COUNT(ProductType) OVER(PARTITION BY ProductNumber) AS [RowNumber]
 FROM #Product
)
SELECT * FROM CTE WHERE RowNumber > 1

Result:

output

What's wrong with my code? Or how can I achieve this approach?

CodePudding user response:

GROUP BYin the CTE with HAVING for total for every product number and also for DISTINCT type yields the result you are searching for

WITH CTE AS
(SELECT
[ProductNumber], [ProductName]
FROM 
[#Product]
GROUP BY [ProductNumber], [ProductName] 
HAVING COUNT(*) > 1 AND COUNT(DISTINCT [ProductType]) > 1)
SELECT p1.[ProductNumber], p1.[ProductType], p1.[ProductName] 
FROM [#Product] p1 JOIN CTE p2 ON p1.[ProductNumber] = p2.[ProductNumber]
ProductNumber | ProductType | ProductName
:------------ | ----------: | :----------
A-000217      |           1 | stuff 13   
A-000217      |           5 | stuff 13   
A-001146      |           5 | stuff 21   
A-001146      |           1 | stuff 21   

db<>fiddle here

CodePudding user response:

This is not very intuitive, but it works. You want ProductNumberCount to be 2 or more (both 1 and 5), but ProductTypeCount to be 1 (only one item of each 1 and 5, per ProductNumber). And perhaps not bad idea to filter only for 1 and 5 in CTE itself.

;WITH CTE AS (
    SELECT 
          ProductNumber 
        , ProductType 
        , ProductName
        , COUNT(ProductNumber) OVER(PARTITION BY ProductNumber) AS [ProductNumberCount]
        , COUNT(ProductType) OVER(PARTITION BY ProductNumber, ProductType) AS [ProductTypeCount]
     FROM #Product
     WHERE ProductType IN (1, 5)
)
SELECT ProductNumber
    ,ProductType
    ,ProductName
FROM CTE WHERE CTE.ProductNumberCount >= 2 AND CTE.ProductTypeCount = 1

Example here

CodePudding user response:

One other way is a correlated query using exists which should be performant:

select * 
from #Product p
where exists (
    select * from #Product p2
    where p2.ProductNumber = p.ProductNumber
    group by p2.productNumber
    having Min(p2.productType) = 1 and Max(p2.productType) = 5
);

Demo DB<>Fiddle

  • Related