I can not show a query that isn't working. Because I simply can not even wrap my brain around how to do this. I can count number of devices and number of dates for a specific device.
I have a table containing a series of dates where we have seen a specific device. Devices are registered daily, if they were seen.
I would like to count the number of breaks in the timestamp flow. So if there is a date missing in the timestamp it should register as a new block for this device.
Since today is the 25th of may:
- Device00 should count 3 times (3 starts [green], and 3 stops [red])
- Device58 should count 2 times (3 starts [green], and 2 stops [red])
Device58, is not counted because it was active today (25/5). if Device58 is not active tomorrow, then it should be counted. If it is active tomorrow then it should not be counted.
Devices are only counted when there is at least a calendar day between entry's in the data.
Data are collected once a day (4 am) and is not time critical.
Can this be done with a SQL query? My data set is currently 100.000 rows, so I'm hoping for an "easy" way to count these. But I can't quite figure it out.
DATA:
Device | TimeStamp |
---|---|
Device00 | 2022-04-25 |
Device00 | 2022-04-26 |
Device00 | 2022-04-27 |
Device00 | 2022-04-28 |
Device00 | 2022-04-29 |
Device00 | 2022-04-30 |
Device00 | 2022-05-01 |
Device00 | 2022-05-02 |
Device00 | 2022-05-03 |
Device00 | 2022-05-04 |
Device00 | 2022-05-05 |
Device00 | 2022-05-06 |
Device00 | 2022-05-07 |
Device00 | 2022-05-08 |
Device00 | 2022-05-09 |
Device00 | 2022-05-11 |
Device00 | 2022-05-12 |
Device00 | 2022-05-14 |
Device00 | 2022-05-15 |
Device00 | 2022-05-16 |
Device00 | 2022-05-17 |
Device00 | 2022-05-18 |
Device00 | 2022-05-19 |
Device58 | 2022-04-25 |
Device58 | 2022-04-26 |
Device58 | 2022-04-27 |
Device58 | 2022-04-28 |
Device58 | 2022-04-29 |
Device58 | 2022-04-30 |
Device58 | 2022-05-01 |
Device58 | 2022-05-02 |
Device58 | 2022-05-03 |
Device58 | 2022-05-04 |
Device58 | 2022-05-05 |
Device58 | 2022-05-06 |
Device58 | 2022-05-07 |
Device58 | 2022-05-08 |
Device58 | 2022-05-09 |
Device58 | 2022-05-11 |
Device58 | 2022-05-12 |
Device58 | 2022-05-14 |
Device58 | 2022-05-15 |
Device58 | 2022-05-16 |
Device58 | 2022-05-17 |
Device58 | 2022-05-18 |
Device58 | 2022-05-19 |
Device58 | 2022-05-20 |
Device58 | 2022-05-21 |
Device58 | 2022-05-22 |
Device58 | 2022-05-23 |
Device58 | 2022-05-24 |
Device58 | 2022-05-25 |
Expected result:
Device | Count |
---|---|
Device00 | 3 |
Device58 | 2 |
CodePudding user response:
SELECT t2.device,
CASE WHEN t2.starts > t2.stops THEN t2.stops ELSE t2.starts END
FROM
(
SELECT device,
COUNT(CASE WHEN t.prev IS null OR t.prev > 1 THEN 1 ELSE null END) AS starts,
COUNT(CASE WHEN t.next IS null OR t.next > 1 THEN 1 ELSE null END) AS stops
FROM
(
SELECT *,
DATEDIFF(day, LAG(timestamp) OVER(PARTITION BY device ORDER BY timestamp), timestamp)
AS prev,
DATEDIFF(day, timestamp, LEAD(timestamp) OVER(PARTITION BY device ORDER BY timestamp))
AS next
FROM tbl
) AS t
WHERE timestamp <= DATEADD(day, -1, GETDATE())
GROUP BY device
) AS t2;
The core idea of the query is utilizing "window function" LAG()
and LEAD()
which introduced in SQL Server 2012. Both function allows for queries to be aware to what comes before and after of the current row.
CodePudding user response:
Requires at least SQL Server 2012. In earlier versions you could do it with a cursor, I guess. The @Device
variable stores the list of devices, @Log
is your sample data. By default the result set shows aggregates for the last month, but if you want to see more - change the @Months
variable.
DECLARE @Log TABLE ([Device] VARCHAR(50), [TimeStamp] DATE)
DECLARE @Device TABLE ([Device] VARCHAR(50))
--=========================================
--START OF TEST DATA
INSERT INTO @Log VALUES ('Device00', '2022-04-25'), ('Device00', '2022-04-26'), ('Device00', '2022-04-27'), ('Device00', '2022-04-28'), ('Device00', '2022-04-29'), ('Device00', '2022-04-30'), ('Device00', '2022-05-01'), ('Device00', '2022-05-02'), ('Device00', '2022-05-03'), ('Device00', '2022-05-04'), ('Device00', '2022-05-05'), ('Device00', '2022-05-06'), ('Device00', '2022-05-07'), ('Device00', '2022-05-08'), ('Device00', '2022-05-09'), ('Device00', '2022-05-11'), ('Device00', '2022-05-12'), ('Device00', '2022-05-14'), ('Device00', '2022-05-15'), ('Device00', '2022-05-16'), ('Device00', '2022-05-17'), ('Device00', '2022-05-18'), ('Device00', '2022-05-19'), ('Device58', '2022-04-25'), ('Device58', '2022-04-26'), ('Device58', '2022-04-27'), ('Device58', '2022-04-28'), ('Device58', '2022-04-29'), ('Device58', '2022-04-30'), ('Device58', '2022-05-01'), ('Device58', '2022-05-02'), ('Device58', '2022-05-03'), ('Device58', '2022-05-04'), ('Device58', '2022-05-05'), ('Device58', '2022-05-06'), ('Device58', '2022-05-07'), ('Device58', '2022-05-08'), ('Device58', '2022-05-09'), ('Device58', '2022-05-11'), ('Device58', '2022-05-12'), ('Device58', '2022-05-14'), ('Device58', '2022-05-15'), ('Device58', '2022-05-16'), ('Device58', '2022-05-17'), ('Device58', '2022-05-18'), ('Device58', '2022-05-19'), ('Device58', '2022-05-20'), ('Device58', '2022-05-21'), ('Device58', '2022-05-22'), ('Device58', '2022-05-23'), ('Device58', '2022-05-24'), ('Device58', '2022-05-25')
INSERT INTO @Device VALUES ('Device00'), ('Device58')
--END OF TEST DATA
--=========================================
DECLARE @Months INT = 1
DECLARE @Today DATE = GETDATE()
DECLARE @EarliestDate DATE = DATEADD(M, -@Months, @Today)
; WITH CTE
AS
(
SELECT @Today AS [Date]
UNION ALL
SELECT DATEADD(D, -1, [Date])
FROM CTE WHERE [Date] > @EarliestDate
)
SELECT [Device], COUNT([Starts]) AS [Starts], COUNT([Stops]) AS [Stops] FROM
(
SELECT
[DeviceDates].[Device],
CASE
WHEN
(
LEAD([Log].[TimeStamp], 1) OVER (ORDER BY [DeviceDates].[Device], [DeviceDates].[Date]) IS NOT NULL
AND [Log].[TimeStamp] IS NULL
)
OR
(
ROW_NUMBER() OVER (PARTITION BY [DeviceDates].[Device] ORDER BY [DeviceDates].[Date]) = 1
AND [Log].[TimeStamp] IS NOT NULL
)
THEN 1
END [Starts],
CASE
WHEN [DeviceDates].[Date] != @Today
AND [Log].[TimeStamp] IS NOT NULL
AND LEAD([Log].[TimeStamp], 1) OVER (ORDER BY [DeviceDates].[Device], [DeviceDates].[Date]) IS NULL
THEN 1
END [Stops]
FROM (
SELECT Device, [Date]
FROM CTE
CROSS JOIN @Device
) AS [DeviceDates]
LEFT OUTER JOIN @Log AS [Log]
ON [DeviceDates].[Device] = [Log].[Device]
AND [DeviceDates].[Date] = [Log].[TimeStamp]
) aggregated
GROUP BY [Device]
CodePudding user response:
with data as (
select *,
datediff(day, cast('20000101' as date), "TimeStamp")
- row_number() over (partition by Device order by "TimeStamp") as grp
from T
)
select distinct Device, count(*) over (partition by Device) as "Count"
from data
group by Device, grp
-- eliminate anything that still on a streak through yesterday
having max(Timestamp) < dateadd(day, -1, cast(getdate() as date));
Uses standard gaps and islands numbering. It does rely on row_number()
so it will work on 2005 editions and later.
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=b1698ddc0350c3087a3fdaf9554c54ad