I have some data like the following in a Snowflake database
DEVICE_SERIAL | REASON_CODE | VERSION | MESSAGE_CREATED_AT | NEXT_REASON_CODE |
---|---|---|---|---|
BA1254862158 | 1 | 4 | 2022-06-23 02:06:03 | 4 |
BA1254862158 | 4 | 4 | 2022-06-23 02:07:07 | 1 |
BA1110001111 | 1 | 5 | 2022-06-16 16:19:04 | 4 |
BA1110001111 | 4 | 5 | 2022-06-16 17:43:04 | 1 |
BA1110001111 | 5 | 5 | 2022-06-20 14:37:45 | 4 |
BA1110001111 | 4 | 5 | 2022-06-20 17:31:12 | 1 |
that's the result of a previous query. I'm trying to get the difference between message_created_at
timestamps where the device_serial
is the same between subsequent rows, and the first row (of the pair for the difference) has reason_code
of 1
or 5
, and the second row of the pair has reason_code
4
.
For this example, my desired output would be
DEVICE_SERIAL | VERSION | DELTA_SECONDS |
---|---|---|
BA1254862158 | 4 | 64 |
BA1110001111 | 5 | 5040 |
BA1110001111 | 5 | 10407 |
It's easy to calculate the time difference between every pair of rows (just lead or lag datediff). But I'm not sure how to structure a query to select only the desired rows so that I can get a datediff between them, without calculating spurious datediffs.
My ultimate goal is to see how these datediffs change between version
s. I am but a lowly C programmer, my SQL-fu is weak.
CodePudding user response:
with data as (
select *,
count(case when reason_code in (1, 5) then 1 end)
over (partition by device_serial order by message_created_at) as grp
/* or alternately bracket by the end code */
-- count(case when reason_code = 4 then 1 end)
-- over (partition by device_serial order by message_created_at desc) as grp
from T
)
select device_serial, min(version) as version,
datediff(second, min(message_created_at), max(message_created_at)) as delta_seconds
from data
group by device_serial, grp