I have some data consisting of shifts, logging the time periods taken as breaks during the shift.
start_ts end_ts shift_id
2022-01-01T08:31:37Z 2022-01-01T08:58:37Z 1
2022-01-01T08:37:37Z 2022-01-01T09:03:37Z 2
2022-01-01T08:46:37Z 2022-01-01T08:48:37Z 3
I want to map this data to a 15-minute grid, counting how many seconds in total (not per shift) are spent on break during that interval. A solution would look like this:
start_time end_time total_break_seconds
2022-01-01T08:30:00Z 2022-01-01T08:45:00Z 1246
2022-01-01T08:45:00Z 2022-01-01T09:00:00Z 1837
2022-01-01T09:00:00Z 2022-01-01T09:15:00Z 217
I know this is a gaps-and-islands style problem, but I'm not sure how to combine this with the mapping to a time grid element. I've looked at using UNIX_SECONDS/time-to-epoch to get the 15-minute intervals, but can't make it out. I'll be working with pretty large tables so ideally I would do as much work as possible before expanding each time interval to the 15-minute grid, but all solutions welcome.
I'm working on BigQuery
Here's a reproducible example to start with:
SELECT
TIMESTAMP("2022-01-01 08:31:37") AS start_ts,
TIMESTAMP("2022-01-01 08:58:37") AS end_ts,
1 as shift_id
UNION ALL (
SELECT
TIMESTAMP("2022-01-01 08:37:37") AS start_ts,
TIMESTAMP("2022-01-01 09:03:37") AS end_ts,
2 as shift_id
)
UNION ALL (
SELECT
TIMESTAMP("2022-01-01 08:46:37") AS start_ts,
TIMESTAMP("2022-01-01 08:48:37") AS end_ts,
3 as shift_id
)
CodePudding user response:
Consider below
with grid as (
select start_time, timestamp_sub(timestamp_add(start_time, interval 15 minute), interval 1 second) end_time
from (
select max(end_ts) max_end,
timestamp_trunc(min(start_ts), hour) min_start
from your_table
), unnest(generate_timestamp_array(min_start, max_end, interval 15 minute)) start_time
), seconds as (
select ts from your_table,
unnest(generate_timestamp_array(start_ts, end_ts, interval 1 second)) ts
)
select start_time, end_time, count(*) total_break_seconds
from grid
join seconds
on ts between start_time and end_time
group by start_time, end_time
if applied to sample data in your question - output is
CodePudding user response:
With below query:
WITH breaks AS (
SELECT *,
CASE
-- for staring break (considering start_ts and end_ts are in same break)
WHEN break <= start_ts AND end_ts < break INTERVAL 15 MINUTE THEN TIMESTAMP_DIFF(end_ts, start_ts, SECOND)
WHEN break <= start_ts THEN 900 - TIMESTAMP_DIFF(start_ts, break, SECOND)
-- for remaining breaks (considering full break partial break)
ELSE IF(DIV(diff, 900) > 0 AND break INTERVAL 15 MINUTE < end_ts, 900, MOD(diff, 900))
END AS elapsed
FROM sample,
UNNEST(GENERATE_TIMESTAMP_ARRAY(
TIMESTAMP_TRUNC(start_ts, HOUR), TIMESTAMP_TRUNC(end_ts, HOUR) INTERVAL 1 HOUR, INTERVAL 15 MINUTE
)) break,
UNNEST([TIMESTAMP_DIFF(end_ts, break, SECOND)]) diff
WHERE break INTERVAL 15 MINUTE >= start_ts AND break < end_ts
)
SELECT break AS start_time, break INTERVAL 15 MINUTE AS end_time, SUM(elapsed) total_break_seconds
FROM breaks
GROUP BY 1 ORDER BY 1;
Output will be: