I need to select or remove records from badge-records that have a date difference less than 1 day If 2 or more records exists within 24hours all of them has to be removed but one. (the first or last one, this is not important) A select query to find them is ok, so I can remove them manually.
Only records of the same badge number must be compared and removed.
Is this possible using TSQL ?
Example:
------------------ --------------
| TimeStamp | Badge |
------------------ --------------
| 19-10-2021 10:18 | Badge1 |
| 20-10-2021 12:18 | Badge1 |
| 22-10-2021 13:23 | Badge1 |
| 22-10-2021 11:18 | Badge1 | <--- remove
| 22-10-2021 13:18 | Badge1 | <--- remove
| 23-10-2021 14:18 | Badge1 |
| 21-10-2021 09:18 | Badge12 |
| 23-10-2021 10:18 | Badge12 |
| 23-10-2021 23:18 | Badge12 | <--- remove
| 25-10-2021 12:18 | Badge12 |
------------------ --------- ----
CodePudding user response:
Because of the chained dependencies of any given record status on the statuses of all of the preceding records (with the same Badge value), I don't think there is a single statement solution. The following uses a cursor to step through the records while tracking the prior retained values.
DECLARE @Data TABLE (Id INT IDENTITY(1,1) PRIMARY KEY, TimeStamp DATETIME, Badge VARCHAR(100))
INSERT @Data
VALUES
('2021-10-19 10:18', 'Badge1'),
('2021-10-20 12:18', 'Badge1'),
('2021-10-22 13:23', 'Badge1'),
('2021-10-22 11:18', 'Badge1'), -- remove ??? (prior record is out of sequence)
('2021-10-22 13:18', 'Badge1'), -- remove
('2021-10-23 14:18', 'Badge1'),
('2021-10-21 09:18', 'Badge12'),
('2021-10-23 10:18', 'Badge12'),
('2021-10-23 23:18', 'Badge12'), -- remove
('2021-10-25 12:18', 'Badge12')
DECLARE @Id INT
DECLARE @TimeStamp DATETIME
DECLARE @Badge VARCHAR(100)
DECLARE @PriorTimeStamp DATETIME = NULL
DECLARE @PriorBadge VARCHAR(100) = NULL
DECLARE Csr CURSOR FOR
SELECT Id, Timestamp, Badge
FROM @Data
ORDER BY Badge, Timestamp, Id
--ORDER BY Id
OPEN Csr
FETCH NEXT FROM Csr INTO @Id, @TimeStamp, @Badge
WHILE @@FETCH_STATUS = 0
BEGIN
IF @Badge = @PriorBadge AND @TimeStamp < DATEADD(DAY, 1, @PriorTimeStamp)
BEGIN
DELETE FROM @Data WHERE CURRENT OF CSR
END
ELSE BEGIN -- Keep
SET @PriorBadge = @Badge
SET @PriorTimeStamp = @TimeStamp
END
FETCH NEXT FROM Csr INTO @Id, @TimeStamp, @Badge
END
CLOSE Csr
DEALLOCATE Csr
SELECT *
FROM @data
ORDER BY ID
Results:
Id | TimeStamp | Badge |
---|---|---|
1 | 2021-10-19 10:18:00.000 | Badge1 |
2 | 2021-10-20 12:18:00.000 | Badge1 |
4 | 2021-10-22 11:18:00.000 | Badge1 |
6 | 2021-10-23 14:18:00.000 | Badge1 |
7 | 2021-10-21 09:18:00.000 | Badge12 |
8 | 2021-10-23 10:18:00.000 | Badge12 |
10 | 2021-10-25 12:18:00.000 | Badge12 |
Note that the '2021-10-22 13:23' value was retained and the '2021-10-22 11:18' value was deleted due to the ascending dates. If original data order matters, swap out the ORDER BY
clauses above.
After that, all we have to do is select the rows that have DeltaMinutes >= 1440 minutes, which is the number of minutes in a day. If you change the >= to a < , then it will display the rows that are less than 1 day from the previous row.
WITH cte AS
(
SELECT *
,DeltaMinutes = DATEDIFF(mi,LAG(TimeStamp,1,DATEADD(dd,-2,TimeStamp)) OVER (PARTITION BY Badge ORDER BY TimeStamp),TimeStamp)
FROM #TestTable
)
SELECT TimeStamp,Badge
FROM cte
WHERE DeltaMinutes >= 1440 --Minutes in a day.
ORDER BY CONVERT(INT,SUBSTRING(Badge,6,5)) --Sort badge numbers in numeric instead of string order.
,TimeStamp
;
The result from that looks like this...
Of course, LAG is only available from 2012 and onward. If you're using 2008/2008R2 or less, there's another non-recursive trick we can use.
And, with all of that, I have to say that I think your spec may be incorrect, as @TN implies. I think what you really want is to keep the first row (for example) and remove all rows from the output that are within 1440 minutes of THAT row. The next row that's at least 1440 minutes after that first row would become the new "anchor" row and all rows less than 1440 minutes after THAT row would be removed from the output. If that's what you really meant, then post back and we'll solve that problem in a similar manner.
And with the current data, you probably won't see any difference but if the data changes, you will.