I am trying to find duplicate orders within the dataset that I can search from.
Table: Transaction_Data
Columns: Guest_ID, Order_ID, Name, Quote_Date, Arrival_Date, Sale_Location, Product_Code and Deposit_Amount
Guest_ID | Order_ID | Name | Quote_Date | Arrival_Date | Sale_Location | Product_Code | Deposit_Amount |
---|---|---|---|---|---|---|---|
1 | 123455 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
1 | 123456 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
2 | 123457 | Guest2 | 12/2/2022 | 12/21/2022 | Location2 | Product2 | 105 |
3 | 123458 | Guest3 | 12/3/2022 | 12/22/2022 | Location3 | Product3 | 110 |
1 | 123459 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
I have tried the following but have not had good luck:
SELECT Guest_ID,Order_ID,COUNT(Order_ID) AS Quantity,Name,Quote_Date,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount
FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
GROUP BY Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount
HAVING COUNT (Order_ID) >1
ORDER BY Order_ID,Guest_ID
SELECT Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount
INTO #TempOrder
FROM Transaction_Data WHERE Quote_Date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
SELECT Guest_ID,Order_ID,
(SELECT MAX (Order_ID)
FROM Transaction_Data TD
WHERE TD.Order_ID < TO.Order_ID
) AS Prev_Order,
(SELECT MIN(Order_ID)
FROM Transaction_Data TD
WHERE TD.Order_ID > TO.Order_ID
) As Nxt_Order, TO.Name,TO.Quote_Date,TO.Arrival_Date,TO.Sale_Location,TO.Product_Code,TO.Deposit_Amount
FROM #TempOrder
This is giving me the Order_ID that is before and after, but is not giving me duplicates based upon the Guest.
If you look at the above table I am attempting to pull in only the following records:
Guest_ID | Order_ID | Name | Quote_Date | Arrival_Date | Sale_Location | Product_Code | Deposit_Amount |
---|---|---|---|---|---|---|---|
1 | 123455 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
1 | 123456 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
1 | 123459 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
EDIT: I will do better next time I have an answer, but this has been answered.
My work SSMS does not support the WITH Clause so I worked this into a temp table:
DROP TABLE IF EXISTS #TempRes;
SELECT rth.ip_number,
ror.reservation_id,
COUNT(ror.reservation_id) OVER (PARTITION BY rth.ip_number,
ror.quote_date,
rdd.deposit_amount
ORDER BY ror.reservation_id
) AS quantity,
MIN(ror.reservation_id) OVER (PARTITION BY rth.ip_number, ror.quote_date, rdd.deposit_amount) AS min_order,
ror.cc_name,
ror.quote_date,
ror.arrival_date,
rth.sale_location_code,
rdd.deposit_amount
INTO #TempRes
FROM dbo.r_order_reservation ror
JOIN dbo.r_transaction_header rth
ON rth.reservation_id = ror.reservation_id
JOIN dbo.r_transaction_detail rtd
ON rtd.reservation_id = ror.reservation_id
AND rtd.product_header_code != '7777777'
JOIN dbo.r_deposit_detail rdd
ON rdd.reservation_id = ror.reservation_id
WHERE ror.quote_date > '2022-12-01'
AND ror.operator_id = 'freeride'
AND rdd.deposit_amount NOT LIKE '-%';
SELECT *
FROM #TempRes tr
WHERE tr.quantity > 1
ORDER BY tr.reservation_id,
tr.ip_number;
DROP TABLE #TempRes;
This gave me the quantity of each but also grouped them together correctly. Thank you!
CodePudding user response:
There is a problem in your Grouping expression, you have included Order_ID
but that is unique, so not duplicated. To find a duplicate you need to remove that field from the comparison.
This response highlights a standard solution to selecting just the duplicated records by using the ROW_NUMBER()
window function that partitions by your grouping set, then we return only the records that were not the first record in the set, so only the duplicates, not the original.
- Window functions provide us a way to evaluate aggregate expressions across a dataset inline with the entire set, so without actually grouping the results.
There is a caveat, we need to use a CTE or subquery to calculate the Window Function if we want to filter or group on the results of the aggregation.
WITH AggregatedData as
(
SELECT Guest_ID,Order_ID
,ROW_NUMBER() OVER(
PARTITION BY Guest_ID, Quote_Date, Product_Code,Deposit_Amount
ORDER BY Order_ID
) AS Order_Duplicate
,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
)
SELECT Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM AggregatedData
WHERE Order_Duplicate > 1
ORDER BY Order_ID,Guest_ID
This should return just the duplicated records, which is often more useful because these represent the transactions that most likely need to be reversed or refunded.
Guest_ID | Order_ID | Name | Quote_Date | Arrival_Date | Sale_Location | Product_Code | Deposit_Amount |
---|---|---|---|---|---|---|---|
1 | 123456 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
1 | 123459 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
If you want to return ALL of the records where there is a duplicate, then we can use the window query version of COUNT()
with the OVER()
clause which is very similar:
WITH AggregatedData as
(
SELECT Guest_ID,Order_ID
,COUNT() OVER(
PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount
ORDER BY Order_ID
) AS Quantity
,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
)
SELECT Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM AggregatedData
WHERE Quantity > 1
ORDER BY Order_ID, Guest_ID
Guest_ID | Order_ID | Name | Quote_Date | Arrival_Date | Sale_Location | Product_Code | Deposit_Amount |
---|---|---|---|---|---|---|---|
1 | 123455 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
1 | 123456 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
1 | 123459 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
We can extend this further by including both of the COUNT
and ROW_NUMBER
results in the output or we can use other aggregates like MIN
:
WITH AggregatedData as
(
SELECT Guest_ID,Order_ID
,COUNT() OVER(
PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount
ORDER BY Order_ID
) AS Quantity
,MIN(Order_ID) OVER(
PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount
) AS Min_Order_ID
,Name,Quote_Date,Arrival_Date
,Sale_Location,Product_Code,Deposit_Amount
FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
)
SELECT Guest_ID,Order_ID,Quantity,Min_Order_ID,Name,Quote_Date
,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount
FROM AggregatedData
WHERE Quantity > 1
ORDER BY Order_ID, Guest_ID
Guest_ID | Order_ID | Quantity | Min_Order_ID | Name | Quote_Date | Arrival_Date | Sale_Location | Product_Code | Deposit_Amount |
---|---|---|---|---|---|---|---|---|---|
1 | 123455 | 3 | 123455 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
1 | 123456 | 3 | 123455 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
1 | 123459 | 3 | 123455 | Guest1 | 12/1/2022 | 12/20/2022 | Location1 | Product1 | 100 |
- Technically we could have used
MIN
as the first example, and filtered where the result ofMIN
did not equal theOrder_ID
but that syntax is harder to interpret if you are not familiar with these concepts.