Home > Blockchain >  Remove consecutive duplicate rows from a table
Remove consecutive duplicate rows from a table

Time:12-27

I am stuck on a problem. I want to remove consecutive duplicate records in the table,
I,e, in the below table I want to calculate the total cost without consecutive duplication.
Like, row 3 should be removed as it is consecutively duplicated with row 2 as all 3 column data is the same.
And same is the case in the second group, row 7 should be removed as it is a duplication of 6.
The total cost at the end should be 10.

How can I do it in SSMS?

ClaimID ClaimLine Cost
M0001R1616878951 2 10
M0001R1616878951 2 -10
M0001R1616878951 2 -10
M0001R1616878951 3 10
M0001R1616878951 3 -10
M0001R1616878951 3 10
M0001R1616878951 3 10

I searched for this problem and tried lead and lag keywords but didnt work.

CodePudding user response:

I prepared an example for your issue and it might be solution for that.

I used CTE, ROW_NUMBER and IIF expressions to generate row_number and filter for duplicate rows.

Preparing the example data:

DECLARE @vClaims TABLE (
    ClaimID     NVARCHAR(16),
    ClaimLine   SMALLINT,
    Cost        SMALLINT
)

INSERT INTO @vClaims
VALUES
    ('M0001R1616878951', 2, 10),
    ('M0001R1616878951', 2, -10),
    ('M0001R1616878951', 2, -10),
    ('M0001R1616878951', 3, 10),
    ('M0001R1616878951', 3, -10),
    ('M0001R1616878951', 3, 10),
    ('M0001R1616878951', 3, 10)

And the query script:

;WITH CTE_ClaimsWithSort AS (
    SELECT
        ClaimID,
        ClaimLine,
        Cost,
        RowNumber   = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
    FROM
        @vClaims
), CTE_ClaimsFiltered AS (
    SELECT 
        ClaimID,
        ClaimLine,
        Cost,
        RowNumber,
        isDuplicate     = IIF(
                            LAG(ClaimID) OVER(ORDER BY RowNumber) = ClaimID
                            AND LAG(ClaimLine) OVER(ORDER BY RowNumber) = ClaimLine
                            AND LAG(Cost) OVER(ORDER BY RowNumber) = Cost
                        , 1, 0)
    FROM 
        CTE_ClaimsWithSort
)
SELECT 
    ClaimID,
    ClaimLine,
    Cost,
    RowNumber,
    isDuplicate
FROM 
    CTE_ClaimsFiltered
WHERE
    isDuplicate = 0

First part of cte: generate row_numbers for your example data. If you have a date column you can use instead of it.

Second part of cte: find and filter duplicate rows by ClaimID, ClaimLine and Cost with IIF expression

The result:

ClaimID ClaimLine Cost RowNumber isDuplicate
M0001R1616878951 2 10 1 0
M0001R1616878951 2 -10 2 0
M0001R1616878951 3 10 4 0
M0001R1616878951 3 -10 5 0
M0001R1616878951 3 10 6 0
  • Related