Home > Mobile >  Aggregating length of time intervals and grouping to fixed time grid
Aggregating length of time intervals and grouping to fixed time grid

Time:06-25

I have some data consisting of shifts, logging the time periods taken as breaks during the shift.

start_ts                end_ts                  shift_id
2022-01-01T08:31:37Z    2022-01-01T08:58:37Z    1
2022-01-01T08:37:37Z    2022-01-01T09:03:37Z    2
2022-01-01T08:46:37Z    2022-01-01T08:48:37Z    3

I want to map this data to a 15-minute grid, counting how many seconds in total (not per shift) are spent on break during that interval. A solution would look like this:

start_time               end_time               total_break_seconds
2022-01-01T08:30:00Z    2022-01-01T08:45:00Z      1246
2022-01-01T08:45:00Z    2022-01-01T09:00:00Z      1837
2022-01-01T09:00:00Z    2022-01-01T09:15:00Z      217

I know this is a gaps-and-islands style problem, but I'm not sure how to combine this with the mapping to a time grid element. I've looked at using UNIX_SECONDS/time-to-epoch to get the 15-minute intervals, but can't make it out. I'll be working with pretty large tables so ideally I would do as much work as possible before expanding each time interval to the 15-minute grid, but all solutions welcome.

I'm working on BigQuery

Here's a reproducible example to start with:

SELECT
    TIMESTAMP("2022-01-01 08:31:37") AS start_ts,
    TIMESTAMP("2022-01-01 08:58:37") AS end_ts,
    1 as shift_id
  UNION ALL (
    SELECT
      TIMESTAMP("2022-01-01 08:37:37") AS start_ts,
      TIMESTAMP("2022-01-01 09:03:37") AS end_ts,
      2 as shift_id
  )
  UNION ALL (
    SELECT
      TIMESTAMP("2022-01-01 08:46:37") AS start_ts,
      TIMESTAMP("2022-01-01 08:48:37") AS end_ts,
      3 as shift_id
       )

CodePudding user response:

Consider below

with grid as (
  select start_time, timestamp_sub(timestamp_add(start_time, interval 15 minute), interval 1 second) end_time
  from (
    select max(end_ts) max_end,
      timestamp_trunc(min(start_ts), hour) min_start
    from your_table
  ), unnest(generate_timestamp_array(min_start, max_end, interval 15 minute)) start_time
), seconds as (
  select ts from your_table, 
  unnest(generate_timestamp_array(start_ts, end_ts, interval 1 second)) ts
)
select start_time, end_time, count(*) total_break_seconds
from grid
join seconds
on ts between start_time and end_time
group by  start_time, end_time    

if applied to sample data in your question - output is

enter image description here

CodePudding user response:

With below query:

WITH breaks AS (
  SELECT *,
         CASE
           -- for staring break (considering start_ts and end_ts are in same break)
           WHEN break <= start_ts AND end_ts < break   INTERVAL 15 MINUTE THEN TIMESTAMP_DIFF(end_ts, start_ts, SECOND)
           WHEN break <= start_ts THEN 900 - TIMESTAMP_DIFF(start_ts, break, SECOND)
           -- for remaining breaks (considering full break   partial break)
           ELSE IF(DIV(diff, 900) > 0 AND break   INTERVAL 15 MINUTE < end_ts, 900, MOD(diff, 900))
         END AS elapsed
    FROM sample,
         UNNEST(GENERATE_TIMESTAMP_ARRAY(
           TIMESTAMP_TRUNC(start_ts, HOUR), TIMESTAMP_TRUNC(end_ts, HOUR)   INTERVAL 1 HOUR, INTERVAL 15 MINUTE
         )) break,
         UNNEST([TIMESTAMP_DIFF(end_ts, break, SECOND)]) diff
   WHERE break   INTERVAL 15 MINUTE >= start_ts AND break < end_ts
)
SELECT break AS start_time, break   INTERVAL 15 MINUTE AS end_time, SUM(elapsed) total_break_seconds 
  FROM breaks
 GROUP BY 1 ORDER BY 1;

Output will be:

enter image description here

  • Related