Problem
I need to filter a set of records conditionally. The rules for filter the data are: find duplicate values by ProductNumber and also they need to have a least two ProductType (1 and 5)
Example
Product
ProductNumber | ProductType | ProductName |
---|---|---|
A-000217 | 1 | stuff 13 |
A-000217 | 5 | stuff 13 |
A-000086 | 5 | stuff 2 |
A-000086 | 5 | stuff 2 |
A-001229 | 1 | stuff 20 |
A-001146 | 5 | stuff 21 |
A-001146 | 1 | stuff 21 |
This should be the output
Product (Filtered)
ProductNumber | ProductType | ProductName |
---|---|---|
A-000217 | 1 | stuff 13 |
A-000217 | 5 | stuff 13 |
A-001146 | 5 | stuff 21 |
A-001146 | 1 | stuff 21 |
- Product A-000086 has been excluded because both records are type 5.
- Product A-001229 only has 1 record so it must be excluded.
I tried this code, but it didn't work as expected.
CREATE TABLE #Product(
[Id] [int] IDENTITY(1,1) NOT NULL,
[ProductNumber] [varchar](60) NOT NULL,
[ProductType] [int] NOT NULL,
[ProductName] [varchar](60) NOT NULL,
PRIMARY KEY(Id)
);
INSERT INTO [#Product]([ProductNumber], [ProductType], [ProductName])
VALUES('A-000217', 1, 'stuff 13')
,('A-000217', 5, 'stuff 13')
,('A-000086', 5, 'stuff 2')
,('A-000086', 5, 'stuff 2')
,('A-001229', 1, 'stuff 20')
,('A-001146', 5, 'stuff 21')
,('A-001146', 1, 'stuff 21')
--Filter data
; WITH CTE AS (
SELECT
ProductNumber
, ProductType
, ProductName
, COUNT(ProductType) OVER(PARTITION BY ProductNumber) AS [RowNumber]
FROM #Product
)
SELECT * FROM CTE WHERE RowNumber > 1
Result:
What's wrong with my code? Or how can I achieve this approach?
CodePudding user response:
GROUP BY
in the CTE
with HAVING
for total for every product number and also for DISTINCT
type yields the result you are searching for
WITH CTE AS (SELECT [ProductNumber], [ProductName] FROM [#Product] GROUP BY [ProductNumber], [ProductName] HAVING COUNT(*) > 1 AND COUNT(DISTINCT [ProductType]) > 1) SELECT p1.[ProductNumber], p1.[ProductType], p1.[ProductName] FROM [#Product] p1 JOIN CTE p2 ON p1.[ProductNumber] = p2.[ProductNumber]
ProductNumber | ProductType | ProductName :------------ | ----------: | :---------- A-000217 | 1 | stuff 13 A-000217 | 5 | stuff 13 A-001146 | 5 | stuff 21 A-001146 | 1 | stuff 21
db<>fiddle here
CodePudding user response:
This is not very intuitive, but it works. You want ProductNumberCount
to be 2 or more (both 1 and 5), but ProductTypeCount
to be 1 (only one item of each 1
and 5
, per ProductNumber
). And perhaps not bad idea to filter only for 1
and 5
in CTE
itself.
;WITH CTE AS (
SELECT
ProductNumber
, ProductType
, ProductName
, COUNT(ProductNumber) OVER(PARTITION BY ProductNumber) AS [ProductNumberCount]
, COUNT(ProductType) OVER(PARTITION BY ProductNumber, ProductType) AS [ProductTypeCount]
FROM #Product
WHERE ProductType IN (1, 5)
)
SELECT ProductNumber
,ProductType
,ProductName
FROM CTE WHERE CTE.ProductNumberCount >= 2 AND CTE.ProductTypeCount = 1
Example here
CodePudding user response:
One other way is a correlated query using exists which should be performant:
select *
from #Product p
where exists (
select * from #Product p2
where p2.ProductNumber = p.ProductNumber
group by p2.productNumber
having Min(p2.productType) = 1 and Max(p2.productType) = 5
);
Demo DB<>Fiddle