Home > Software engineering >  How to ignore duplicate status rows in events table
How to ignore duplicate status rows in events table

Time:11-22

My source events table looks like this:

Source Table

The result set I need should not include the duplicated "OPEN" values (4, 5), and if possible, return the first occurrence only.

The closest I've been able to get is to group them by looking at answers discussing "gaps and islands" but I'm lost as to how to actually "eliminate" the unneeded rows.

https://dbfiddle.uk/4eIDDmXx

CodePudding user response:

Rather than using gaps and islands, I'd suggest using a simple LAG to get the previous row's OpenOrClosed value - then check if that previous row's OpenOrClosed value is different from the current row's.

This involves a first SELECT to get all rows, with LAG to get the corresponding previous row's value - and then a second SELECT to filter out only those that are different.

See this db<>fiddle which was based on your original one.

SELECT [ID]
       ,[EventDateTime]
       ,[SourceEventID]
       ,[OpenOrClosed] 
FROM
    (SELECT [ID]
           ,[EventDateTime]
           ,[SourceEventID]
           ,[OpenOrClosed]
           ,LAG([OpenOrClosed], 1) OVER (ORDER BY ID) AS [LastOOC]
    FROM    [dbo].[test]
    ) AS SourceData
WHERE LastOOC IS NULL OR OpenOrClosed <> LastOOC

Results are as below

ID  EventDateTime             SourceEventID OpenOrClosed
1   2013-01-31 14:20:00.000   331832        OPEN
2   2013-04-18 14:44:00.000   338907        CLOSED
3   2013-04-19 15:30:00.000   341210        OPEN
6   2013-04-24 12:22:00.000   339250        CLOSED
7   2013-05-08 12:32:00.000   340281        OPEN
8   2013-05-08 12:33:00.000   340282        CLOSED
9   2013-05-08 12:34:00.000   340255        OPEN
  • Related