Counting duplicate, triplicate, etc. orders that can have multiple lines per order-CodePudding

We have a table of orders that can have multiple lines per order. There are many occasions that duplicate, triplicate, etc. orders can be submitted.

I am using Microsoft SQL Server Management Studio for querying the table. I am no expert in T-SQL and I am having trouble coming up with a query to show the count for the number of duplicate/triplicate orders where BotReady = 'Y'.

Here is an example of the way our orders are stored in our table (dbo.OrderTable). The PurchaseOrderNumber is NOT unique so therefore we use PurchaseOrderNumber and AccountNumber as a super key.

BotReady	PurchaseOrderNumber	AccountNumber	LineNumber	Quantity	UnitPrice	RequestedUOM	Style	Color	ProductType
Y	217344	0026985	1	30	5.40	EA	hwwtg	11028	H
Y	217344	0026985	2	10	7.99	SQF	lr105	00114	C
Y	217344	0026985	1	30	5.40	EA	hwwtg	11028	H
Y	217344	0026985	2	10	7.99	SQF	lr105	00114	C
N	217344	0026985	1	30	5.40	EA	hwwtg	11028	H
N	217344	0026985	2	10	7.99	SQF	lr105	00114	C
Y	BD624919	0210995	1	158	12.80	SHT	551MR	00122	H
Y	BD624919	0210995	1	158	12.80	SHT	551MR	00122	H
Y	BD624919	0210995	1	158	12.80	SHT	551MR	00122	H
Y	BD624920	0210995	1	12	3.80	SQY	211MX	00132	H
N	BD624920	0210995	2	12	0.99	EA	HA258	01088	H
Y	1269407	1911403	1	55	12.99	BOX	HSMPR	00150	H
Y	1269407	1911403	2	2	200.00	ROL	TGJ88	01088	C
Y	1269407	1911403	3	1	10.00	EA	00000	00001	R
Y	1269407	1911403	1	55	12.99	BOX	HSMPR	00150	H
Y	1269407	1911403	2	2	200.00	ROL	TGJ88	01088	C
Y	1269407	1911403	3	1	10.00	EA	00000	00001	R

The SQL should return 4 since there is 1 duplicate order for PurchaseOrderNumber = 217344, 2 duplicates for BD624919, and 1 duplicate for 1269407. So since each order can contain multiple lines per order that can affect how duplicates are counted.

For example, since PurchaseOrderNumber 1269407 has 3 lines in the order that appear twice it still only counts as 1 duplicate order (even if 2 of the 3 lines were duplicated) since they are all apart of the same order.

CodePudding user response：

COUNT and MAX in sub-queries would do the trick:

If you want only number of duplicates to be displayed as 4, the query would be:

select sum(tot) AS FINAL 
from
    (
    select PurchaseOrderNumber, max(cnt) as tot
    from
    (
        select 
             BotReady, 
             PurchaseOrderNumber,
             AccountNumber,
             Linenumber,
             count(*)-1 as cnt
        from t
        where botready='Y'
        group by 
             BotReady, 
             PurchaseOrderNumber,
             AccountNumber,
             Linenumber
             having count(*)>1
    ) as q
group by PurchaseOrderNumber
) as h

If you want which PurchaseOrderNumber had how many duplicates, the query would be:

select PurchaseOrderNumber, max(cnt) as tot
from
(
    select 
         BotReady, 
         PurchaseOrderNumber,
         AccountNumber,
         Linenumber,
         count(*)-1 as cnt
    from t
    where botready='Y'
    group by 
         BotReady, 
         PurchaseOrderNumber,
         AccountNumber,
         Linenumber
         having count(*)>1
) as q
group by PurchaseOrderNumber

Not much different, just final SUM is omitted.

Above queries would handle even if some of linenumbers are not repeated. Try that in fiddle link below.

see DEMO in db<>fiddle

CodePudding user response：

If i undertand well you want to detect duplicate in your table, the same data on all columns, this query can help you :

SELECT BotReady
    ,PurchaseOrderNumber
    ,AccountNumber
    ,LineNumber
    ,Quantity
    ,UnitPrice
    ,RequestedUOM
    ,Style
    ,Color
    ,ProductType
    ,COUNT(1)
FROM dbo.OrderTable
WHERE BotReady = 'Y'
GROUP BY BotReady
   ,PurchaseOrderNumber
   ,AccountNumber
   ,LineNumber
   ,Quantity
   ,UnitPrice
   ,RequestedUOM
   ,Style
   ,Color
   ,ProductType
HAVING COUNT(1) > 1