Home > Software engineering >  Breaking down of time aggregated data on several evenly distributed intervals
Breaking down of time aggregated data on several evenly distributed intervals

Time:03-25

The database structure looks like this. It's just a single row representing employee, number of tasks were being processed, the fact of starting whole activity and also the fact of ending and total summary of spent seconds.

employee number_of_tasks start_act end_act total_seconds
AXF-6263 5 12:30 14:10 6000

I want to break this down on equal intervals for 1 hour each, additionally summing up total seconds, falling into particular interval. Thus getting a reslut like this. So is there any kind of SQL approach to implement this problem ?

employee number_of_tasks start_act end_act total_seconds
AXF-6263 5 12:00 13:00 1800
AXF-6263 5 13:00 14:00 3600
AXF-6263 5 14:00 15:00 600

Thanks in advance for any kind of help !

CodePudding user response:

Use generate_series and cross join it with your table.

-- Test case
create temporary table the_table (employee text, number_of_tasks integer, start_act time, end_act time, total_seconds integer);
insert into the_table values ('AXF-6263', 5, '12:30', '14:10', 6000);

-- Query
select employee, number_of_tasks, 
       h::time start_act, h::time    interval '1 hour' end_act, 
       extract('epoch' from least(h::time   interval '1 hour', end_act) - greatest(h::time, start_act))::integer total_seconds
from the_table cross join lateral
     generate_series(date_trunc('hour', current_date   start_act), date_trunc('hour', current_date   end_act), interval '1 hour') h
order by h;
employee number_of_tasks start_act end_act total_seconds
AXF-6263 5 12:00:00 13:00:00 1800
AXF-6263 5 13:00:00 14:00:00 3600
AXF-6263 5 14:00:00 15:00:00 600

CodePudding user response:

Interesting task, I decided to join on a series then figure out he overlap.

My approach here, with a flexible time unit so you could change to days/weeks/etc

with employees AS (
  SELECT
    'AXF-6263' AS employee,
    5 AS number_of_tasks,
    '2022-03-25 12:30'::timestamp AS start_act,
    '2022-03-25 14:10'::timestamp AS end_act,
    6000 AS total_seconds
),

-- select a set of employees to work on
sample_set AS (
  SELECT
    employee,
    start_act,
    end_act
  FROM
    employees
  -- WHERE ...
  -- LIMIT ...
),

-- choose a unit for the interval
unit AS (
  SELECT
    'hour' AS unit,
    '1 hour'::interval AS interval
),

-- generate a full series of time intervals
time_spans AS (
  SELECT
    sample_set.employee,
    generate_series(
       min(date_trunc(unit.unit, start_act)),
       max(date_trunc(unit.unit, end_act)),
       unit.interval
     ) AS start_act
  FROM
    sample_set
  JOIN unit
    ON true
  GROUP BY
    employee,
    unit.interval
)

-- final results
SELECT
  employee,
  number_of_tasks,
  time_spans.start_act,
  time_spans.start_act   unit.interval AS end_act,
  CASE
    WHEN employees.start_act > time_spans.start_act THEN time_spans.start_act - employees.start_act   unit.interval
    WHEN employees.end_act < time_spans.start_act   unit.interval THEN employees.end_act - time_spans.start_act
    ELSE unit.interval
  END AS total_seconds
FROM
  employees
JOIN time_spans USING (employee)
JOIN unit ON
  true
ORDER BY
  employee,
  time_spans.start_act
  • Related