Home > front end >  select and remove SQL records with a timestamp difference < 1 day
select and remove SQL records with a timestamp difference < 1 day

Time:02-25

I need to select or remove records from badge-records that have a date difference less than 1 day If 2 or more records exists within 24hours all of them has to be removed but one. (the first or last one, this is not important) A select query to find them is ok, so I can remove them manually.

Only records of the same badge number must be compared and removed.

Is this possible using TSQL ?

Example:

 ------------------ -------------- 
| TimeStamp        | Badge        |
 ------------------ -------------- 
| 19-10-2021 10:18 | Badge1       |
| 20-10-2021 12:18 | Badge1       |
| 22-10-2021 13:23 | Badge1       |  
| 22-10-2021 11:18 | Badge1       |  <--- remove
| 22-10-2021 13:18 | Badge1       |  <--- remove
| 23-10-2021 14:18 | Badge1       |
| 21-10-2021 09:18 | Badge12      |
| 23-10-2021 10:18 | Badge12      |
| 23-10-2021 23:18 | Badge12      |  <--- remove
| 25-10-2021 12:18 | Badge12      |
 ------------------ --------- ---- 

CodePudding user response:

Because of the chained dependencies of any given record status on the statuses of all of the preceding records (with the same Badge value), I don't think there is a single statement solution. The following uses a cursor to step through the records while tracking the prior retained values.

DECLARE @Data TABLE (Id INT IDENTITY(1,1) PRIMARY KEY, TimeStamp DATETIME, Badge VARCHAR(100))
INSERT @Data
VALUES
    ('2021-10-19 10:18', 'Badge1'),
    ('2021-10-20 12:18', 'Badge1'),
    ('2021-10-22 13:23', 'Badge1'),  
    ('2021-10-22 11:18', 'Badge1'), -- remove ??? (prior record is out of sequence)
    ('2021-10-22 13:18', 'Badge1'), -- remove
    ('2021-10-23 14:18', 'Badge1'),
    ('2021-10-21 09:18', 'Badge12'),
    ('2021-10-23 10:18', 'Badge12'),
    ('2021-10-23 23:18', 'Badge12'), -- remove
    ('2021-10-25 12:18', 'Badge12')

DECLARE @Id INT
DECLARE @TimeStamp DATETIME
DECLARE @Badge VARCHAR(100)
DECLARE @PriorTimeStamp DATETIME = NULL
DECLARE @PriorBadge VARCHAR(100) = NULL

DECLARE Csr CURSOR FOR 
    SELECT Id, Timestamp, Badge
    FROM @Data
    ORDER BY Badge, Timestamp, Id
    --ORDER BY Id
OPEN Csr

FETCH NEXT FROM Csr INTO @Id, @TimeStamp, @Badge
WHILE @@FETCH_STATUS = 0
BEGIN
    IF @Badge = @PriorBadge AND @TimeStamp < DATEADD(DAY, 1, @PriorTimeStamp)
    BEGIN
        DELETE FROM @Data WHERE CURRENT OF CSR
    END
    ELSE BEGIN -- Keep
        SET @PriorBadge = @Badge
        SET @PriorTimeStamp = @TimeStamp
    END

    FETCH NEXT FROM Csr INTO @Id, @TimeStamp, @Badge
END

CLOSE Csr
DEALLOCATE Csr

SELECT *
FROM @data
ORDER BY ID

Results:

Id TimeStamp Badge
1 2021-10-19 10:18:00.000 Badge1
2 2021-10-20 12:18:00.000 Badge1
4 2021-10-22 11:18:00.000 Badge1
6 2021-10-23 14:18:00.000 Badge1
7 2021-10-21 09:18:00.000 Badge12
8 2021-10-23 10:18:00.000 Badge12
10 2021-10-25 12:18:00.000 Badge12

Note that the '2021-10-22 13:23' value was retained and the '2021-10-22 11:18' value was deleted due to the ascending dates. If original data order matters, swap out the ORDER BY clauses above.

See Results From Above

After that, all we have to do is select the rows that have DeltaMinutes >= 1440 minutes, which is the number of minutes in a day. If you change the >= to a < , then it will display the rows that are less than 1 day from the previous row.

   WITH cte AS
(
 SELECT *
        ,DeltaMinutes = DATEDIFF(mi,LAG(TimeStamp,1,DATEADD(dd,-2,TimeStamp)) OVER (PARTITION BY Badge ORDER BY TimeStamp),TimeStamp) 
   FROM #TestTable
)
 SELECT TimeStamp,Badge
   FROM cte 
  WHERE DeltaMinutes >= 1440 --Minutes in a day.
  ORDER BY  CONVERT(INT,SUBSTRING(Badge,6,5)) --Sort badge numbers in numeric instead of string order.
           ,TimeStamp
;

The result from that looks like this...

enter image description here

Of course, LAG is only available from 2012 and onward. If you're using 2008/2008R2 or less, there's another non-recursive trick we can use.

And, with all of that, I have to say that I think your spec may be incorrect, as @TN implies. I think what you really want is to keep the first row (for example) and remove all rows from the output that are within 1440 minutes of THAT row. The next row that's at least 1440 minutes after that first row would become the new "anchor" row and all rows less than 1440 minutes after THAT row would be removed from the output. If that's what you really meant, then post back and we'll solve that problem in a similar manner.

And with the current data, you probably won't see any difference but if the data changes, you will.

  • Related