We have a table of orders that can have multiple lines per order. There are many occasions that duplicate, triplicate, etc. orders can be submitted.
I am using Microsoft SQL Server Management Studio for querying the table. I am no expert in T-SQL and I am having trouble coming up with a query to show the count for the number of duplicate/triplicate orders where BotReady = 'Y'
.
Here is an example of the way our orders are stored in our table (dbo.OrderTable
). The PurchaseOrderNumber
is NOT unique so therefore we use PurchaseOrderNumber
and AccountNumber
as a super key.
BotReady | PurchaseOrderNumber | AccountNumber | LineNumber | Quantity | UnitPrice | RequestedUOM | Style | Color | ProductType |
---|---|---|---|---|---|---|---|---|---|
Y | 217344 | 0026985 | 1 | 30 | 5.40 | EA | hwwtg | 11028 | H |
Y | 217344 | 0026985 | 2 | 10 | 7.99 | SQF | lr105 | 00114 | C |
Y | 217344 | 0026985 | 1 | 30 | 5.40 | EA | hwwtg | 11028 | H |
Y | 217344 | 0026985 | 2 | 10 | 7.99 | SQF | lr105 | 00114 | C |
N | 217344 | 0026985 | 1 | 30 | 5.40 | EA | hwwtg | 11028 | H |
N | 217344 | 0026985 | 2 | 10 | 7.99 | SQF | lr105 | 00114 | C |
Y | BD624919 | 0210995 | 1 | 158 | 12.80 | SHT | 551MR | 00122 | H |
Y | BD624919 | 0210995 | 1 | 158 | 12.80 | SHT | 551MR | 00122 | H |
Y | BD624919 | 0210995 | 1 | 158 | 12.80 | SHT | 551MR | 00122 | H |
Y | BD624920 | 0210995 | 1 | 12 | 3.80 | SQY | 211MX | 00132 | H |
N | BD624920 | 0210995 | 2 | 12 | 0.99 | EA | HA258 | 01088 | H |
Y | 1269407 | 1911403 | 1 | 55 | 12.99 | BOX | HSMPR | 00150 | H |
Y | 1269407 | 1911403 | 2 | 2 | 200.00 | ROL | TGJ88 | 01088 | C |
Y | 1269407 | 1911403 | 3 | 1 | 10.00 | EA | 00000 | 00001 | R |
Y | 1269407 | 1911403 | 1 | 55 | 12.99 | BOX | HSMPR | 00150 | H |
Y | 1269407 | 1911403 | 2 | 2 | 200.00 | ROL | TGJ88 | 01088 | C |
Y | 1269407 | 1911403 | 3 | 1 | 10.00 | EA | 00000 | 00001 | R |
The SQL should return 4 since there is 1 duplicate order for PurchaseOrderNumber = 217344
, 2 duplicates for BD624919
, and 1 duplicate for 1269407
. So since each order can contain multiple lines per order that can affect how duplicates are counted.
For example, since PurchaseOrderNumber 1269407 has 3 lines in the order that appear twice it still only counts as 1 duplicate order (even if 2 of the 3 lines were duplicated) since they are all apart of the same order.
CodePudding user response:
COUNT
and MAX
in sub-queries would do the trick:
If you want only number of duplicates to be displayed as 4
, the query would be:
select sum(tot) AS FINAL
from
(
select PurchaseOrderNumber, max(cnt) as tot
from
(
select
BotReady,
PurchaseOrderNumber,
AccountNumber,
Linenumber,
count(*)-1 as cnt
from t
where botready='Y'
group by
BotReady,
PurchaseOrderNumber,
AccountNumber,
Linenumber
having count(*)>1
) as q
group by PurchaseOrderNumber
) as h
If you want which PurchaseOrderNumber
had how many duplicates, the query would be:
select PurchaseOrderNumber, max(cnt) as tot
from
(
select
BotReady,
PurchaseOrderNumber,
AccountNumber,
Linenumber,
count(*)-1 as cnt
from t
where botready='Y'
group by
BotReady,
PurchaseOrderNumber,
AccountNumber,
Linenumber
having count(*)>1
) as q
group by PurchaseOrderNumber
Not much different, just final SUM
is omitted.
Above queries would handle even if some of linenumbers
are not repeated. Try that in fiddle link below.
see DEMO in db<>fiddle
CodePudding user response:
If i undertand well you want to detect duplicate in your table, the same data on all columns, this query can help you :
SELECT BotReady
,PurchaseOrderNumber
,AccountNumber
,LineNumber
,Quantity
,UnitPrice
,RequestedUOM
,Style
,Color
,ProductType
,COUNT(1)
FROM dbo.OrderTable
WHERE BotReady = 'Y'
GROUP BY BotReady
,PurchaseOrderNumber
,AccountNumber
,LineNumber
,Quantity
,UnitPrice
,RequestedUOM
,Style
,Color
,ProductType
HAVING COUNT(1) > 1