Home > Blockchain >  How do I pull in duplicate charges based upon certain criteria
How do I pull in duplicate charges based upon certain criteria

Time:12-29

I am trying to find duplicate orders within the dataset that I can search from.

Table: Transaction_Data

Columns: Guest_ID, Order_ID, Name, Quote_Date, Arrival_Date, Sale_Location, Product_Code and Deposit_Amount

Guest_ID Order_ID Name Quote_Date Arrival_Date Sale_Location Product_Code Deposit_Amount
1 123455 Guest1 12/1/2022 12/20/2022 Location1 Product1 100
1 123456 Guest1 12/1/2022 12/20/2022 Location1 Product1 100
2 123457 Guest2 12/2/2022 12/21/2022 Location2 Product2 105
3 123458 Guest3 12/3/2022 12/22/2022 Location3 Product3 110
1 123459 Guest1 12/1/2022 12/20/2022 Location1 Product1 100

I have tried the following but have not had good luck:

SELECT Guest_ID,Order_ID,COUNT(Order_ID) AS Quantity,Name,Quote_Date,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount 
FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
GROUP BY Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount
HAVING COUNT (Order_ID) >1
ORDER BY Order_ID,Guest_ID
SELECT Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount 
INTO #TempOrder
FROM Transaction_Data WHERE Quote_Date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'

SELECT Guest_ID,Order_ID,
(SELECT MAX (Order_ID)
FROM Transaction_Data TD
WHERE TD.Order_ID < TO.Order_ID
) AS Prev_Order,
(SELECT MIN(Order_ID)
FROM Transaction_Data TD
WHERE TD.Order_ID > TO.Order_ID
) As Nxt_Order, TO.Name,TO.Quote_Date,TO.Arrival_Date,TO.Sale_Location,TO.Product_Code,TO.Deposit_Amount
FROM #TempOrder

This is giving me the Order_ID that is before and after, but is not giving me duplicates based upon the Guest.

If you look at the above table I am attempting to pull in only the following records:

Guest_ID Order_ID Name Quote_Date Arrival_Date Sale_Location Product_Code Deposit_Amount
1 123455 Guest1 12/1/2022 12/20/2022 Location1 Product1 100
1 123456 Guest1 12/1/2022 12/20/2022 Location1 Product1 100
1 123459 Guest1 12/1/2022 12/20/2022 Location1 Product1 100

EDIT: I will do better next time I have an answer, but this has been answered.

My work SSMS does not support the WITH Clause so I worked this into a temp table:

DROP TABLE IF EXISTS #TempRes;

SELECT rth.ip_number,
       ror.reservation_id,
       COUNT(ror.reservation_id) OVER (PARTITION BY rth.ip_number,
                                                    ror.quote_date,
                                                    rdd.deposit_amount
                                       ORDER BY ror.reservation_id
                                      ) AS quantity,
       MIN(ror.reservation_id) OVER (PARTITION BY rth.ip_number, ror.quote_date, rdd.deposit_amount) AS min_order,
       ror.cc_name,
       ror.quote_date,
       ror.arrival_date,
       rth.sale_location_code,
       rdd.deposit_amount
INTO #TempRes
FROM dbo.r_order_reservation ror
    JOIN dbo.r_transaction_header rth
        ON rth.reservation_id = ror.reservation_id
    JOIN dbo.r_transaction_detail rtd
        ON rtd.reservation_id = ror.reservation_id
           AND rtd.product_header_code != '7777777'
    JOIN dbo.r_deposit_detail rdd
        ON rdd.reservation_id = ror.reservation_id
WHERE ror.quote_date > '2022-12-01'
      AND ror.operator_id = 'freeride'
      AND rdd.deposit_amount NOT LIKE '-%';

SELECT *
FROM #TempRes tr
WHERE tr.quantity > 1
ORDER BY tr.reservation_id,
         tr.ip_number;

DROP TABLE #TempRes;

This gave me the quantity of each but also grouped them together correctly. Thank you!

CodePudding user response:

There is a problem in your Grouping expression, you have included Order_ID but that is unique, so not duplicated. To find a duplicate you need to remove that field from the comparison.

This response highlights a standard solution to selecting just the duplicated records by using the ROW_NUMBER() window function that partitions by your grouping set, then we return only the records that were not the first record in the set, so only the duplicates, not the original.

  • Window functions provide us a way to evaluate aggregate expressions across a dataset inline with the entire set, so without actually grouping the results.

There is a caveat, we need to use a CTE or subquery to calculate the Window Function if we want to filter or group on the results of the aggregation.

WITH AggregatedData as
(
    SELECT Guest_ID,Order_ID
        ,ROW_NUMBER() OVER( 
            PARTITION BY Guest_ID, Quote_Date, Product_Code,Deposit_Amount 
            ORDER BY Order_ID
        ) AS Order_Duplicate
        ,Name,Quote_Date,Arrival_Date
        ,Sale_Location,Product_Code,Deposit_Amount 
    FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
)
SELECT Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date
    ,Sale_Location,Product_Code,Deposit_Amount 
FROM AggregatedData 
WHERE Order_Duplicate > 1
ORDER BY Order_ID,Guest_ID

This should return just the duplicated records, which is often more useful because these represent the transactions that most likely need to be reversed or refunded.

Guest_ID Order_ID Name Quote_Date Arrival_Date Sale_Location Product_Code Deposit_Amount
1 123456 Guest1 12/1/2022 12/20/2022 Location1 Product1 100
1 123459 Guest1 12/1/2022 12/20/2022 Location1 Product1 100

If you want to return ALL of the records where there is a duplicate, then we can use the window query version of COUNT() with the OVER() clause which is very similar:

WITH AggregatedData as
(
    SELECT Guest_ID,Order_ID
        ,COUNT() OVER( 
            PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount 
            ORDER BY Order_ID
        ) AS Quantity
        ,Name,Quote_Date,Arrival_Date
        ,Sale_Location,Product_Code,Deposit_Amount 
    FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
)
SELECT Guest_ID,Order_ID,Name,Quote_Date,Arrival_Date
    ,Sale_Location,Product_Code,Deposit_Amount 
FROM AggregatedData 
WHERE Quantity > 1
ORDER BY Order_ID, Guest_ID

Guest_ID Order_ID Name Quote_Date Arrival_Date Sale_Location Product_Code Deposit_Amount
1 123455 Guest1 12/1/2022 12/20/2022 Location1 Product1 100
1 123456 Guest1 12/1/2022 12/20/2022 Location1 Product1 100
1 123459 Guest1 12/1/2022 12/20/2022 Location1 Product1 100

We can extend this further by including both of the COUNT and ROW_NUMBER results in the output or we can use other aggregates like MIN:

WITH AggregatedData as
(
    SELECT Guest_ID,Order_ID
        ,COUNT() OVER( 
            PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount 
            ORDER BY Order_ID
        ) AS Quantity
        ,MIN(Order_ID) OVER( 
            PARTITION BY Guest_ID, Quote_Date, Product_Code, Deposit_Amount 
        ) AS Min_Order_ID
        ,Name,Quote_Date,Arrival_Date
        ,Sale_Location,Product_Code,Deposit_Amount 
    FROM Transaction_Data WHERE Quote_date >='2022-11-01' and Deposit_Amount NOT LIKE '-%'
)
SELECT Guest_ID,Order_ID,Quantity,Min_Order_ID,Name,Quote_Date
    ,Arrival_Date,Sale_Location,Product_Code,Deposit_Amount 
FROM AggregatedData 
WHERE Quantity > 1
ORDER BY Order_ID, Guest_ID

Guest_ID Order_ID Quantity Min_Order_ID Name Quote_Date Arrival_Date Sale_Location Product_Code Deposit_Amount
1 123455 3 123455 Guest1 12/1/2022 12/20/2022 Location1 Product1 100
1 123456 3 123455 Guest1 12/1/2022 12/20/2022 Location1 Product1 100
1 123459 3 123455 Guest1 12/1/2022 12/20/2022 Location1 Product1 100
  • Technically we could have used MIN as the first example, and filtered where the result of MIN did not equal the Order_ID but that syntax is harder to interpret if you are not familiar with these concepts.
  • Related