Home > front end >  Finding overlapping timestamps in query
Finding overlapping timestamps in query

Time:01-12

I have an issue with removing or tagging overlapping timestamps grouped by certain ID.

Times can overlap in nest and may have same start time or end time.
If second time starts before previous time has ended it will end before or at the same time as previous time. No time differences will go over 12 hours.

Using T-SQL.

Sample data:

ID  task_id starttime                       endtime
11  1       2023-01-10 06:31:00.000         2023-01-10 08:53:00.000
11  1       2023-01-10 08:00:00.000         2023-01-10 08:53:00.000
11  2       2023-01-10 13:14:00.000         2023-01-10 15:15:00.000
11  2       2023-01-10 15:46:00.000         2023-01-10 17:59:00.000
11  2       2023-01-10 18:49:00.000         2023-01-10 18:50:00.000
12  3       2023-01-09 10:10:00.000         2023-01-09 11:10:00.000
12  3       2023-01-09 10:10:00.000         2023-01-09 10:50:00.000
13  4       2023-01-08 20:00:00.000         2023-01-09 03:44:00.000
13  4       2023-01-08 21:00:00.000         2023-01-09 02:00:00.000
14  5       2023-01-01 19:23:00.000         2023-01-01 20:47:00.000
14  5       2023-01-02 03:35:00.000         2023-01-02 06:57:00.000

Desired result:

ID  task_id starttime                       endtime
11  1       2023-01-10 06:31:00.000         2023-01-10 08:53:00.000
11  2       2023-01-10 13:14:00.000         2023-01-10 15:15:00.000
11  2       2023-01-10 15:46:00.000         2023-01-10 17:59:00.000
11  2       2023-01-10 18:49:00.000         2023-01-10 18:50:00.000
12  3       2023-01-09 10:10:00.000         2023-01-09 11:10:00.000
13  4       2023-01-08 20:00:00.000         2023-01-09 03:44:00.000
14  5       2023-01-01 19:23:00.000         2023-01-01 20:47:00.000
14  5       2023-01-02 03:35:00.000         2023-01-02 06:57:00.000

I've tried methods with lead or lag functions but it doesn't seem to play well with edge cases. For example:

case when lead(starttime) over (partition by task_id order by starttime) <> endtime then 1 else 0 end as overlap_tag

Doesn't count the time in ID 11 task_id 2 from 18:49-18:50 as not overlapping and doesn't seem to take into account the day changing.

CodePudding user response:

I only tested it on PostgreSQL, but it might help.


Conditions

Preparation

CREATE TABLE task_duration (
    id INTEGER, 
    task_id INTEGER,
    start_time TIMESTAMP, 
    end_time TIMESTAMP
);

INSERT INTO task_duration VALUES (11, 1, '2023-01-10 06:31:00.000', '2023-01-10 08:53:00.000');
INSERT INTO task_duration VALUES (11, 1, '2023-01-10 08:00:00.000', '2023-01-10 08:53:00.000');
INSERT INTO task_duration VALUES (11, 2, '2023-01-10 13:14:00.000', '2023-01-10 15:15:00.000');
INSERT INTO task_duration VALUES (11, 2, '2023-01-10 15:46:00.000', '2023-01-10 17:59:00.000');
INSERT INTO task_duration VALUES (11, 2, '2023-01-10 18:49:00.000', '2023-01-10 18:50:00.000');
INSERT INTO task_duration VALUES (12, 3, '2023-01-09 10:10:00.000', '2023-01-09 11:10:00.000');
INSERT INTO task_duration VALUES (12, 3, '2023-01-09 10:10:00.000', '2023-01-09 10:50:00.000');
INSERT INTO task_duration VALUES (13, 4, '2023-01-08 20:00:00.000', '2023-01-09 03:44:00.000');
INSERT INTO task_duration VALUES (13, 4, '2023-01-08 21:00:00.000', '2023-01-09 02:00:00.000');
INSERT INTO task_duration VALUES (14, 5, '2023-01-01 19:23:00.000', '2023-01-01 20:47:00.000');
INSERT INTO task_duration VALUES (14, 5, '2023-01-02 03:35:00.000', '2023-01-02 06:57:00.000');

Query

SELECT id, 
    task_id, 
    start_time, 
    end_time
FROM (
    SELECT id, 
        task_id, 
        start_time, 
        end_time, 
        LAG(start_time) OVER (PARTITION BY task_id ORDER BY task_id, start_time, end_time DESC) AS prev_start_time, 
        LAG(end_time) OVER (PARTITION BY task_id ORDER BY task_id, start_time, end_time DESC) AS prev_end_time
    FROM task_duration
) v
WHERE prev_start_time IS NULL  -- 1st condition
    OR NOT (v.end_time >= v.prev_start_time AND v.start_time <= v.prev_end_time);  -- 2nd condition

Result

id|task_id|start_time             |end_time               |
-- ------- ----------------------- ----------------------- 
11|      1|2023-01-10 06:31:00.000|2023-01-10 08:53:00.000|
11|      2|2023-01-10 13:14:00.000|2023-01-10 15:15:00.000|
11|      2|2023-01-10 15:46:00.000|2023-01-10 17:59:00.000|
11|      2|2023-01-10 18:49:00.000|2023-01-10 18:50:00.000|
12|      3|2023-01-09 10:10:00.000|2023-01-09 11:10:00.000|
13|      4|2023-01-08 20:00:00.000|2023-01-09 03:44:00.000|
14|      5|2023-01-01 19:23:00.000|2023-01-01 20:47:00.000|
14|      5|2023-01-02 03:35:00.000|2023-01-02 06:57:00.000|

CodePudding user response:

Try this https://dbfiddle.uk/id_waBN_

with task_duration_wrn(id, task_id, starttime, endtime, rn, act) as (
    select id, task_id, starttime, endtime, 
        rank() over(partition by id, task_id order by starttime, endtime) as rn,
        cast( 
        case when 
         starttime <= lag(endtime) over(partition by id, task_id order by starttime, endtime) 
        then 'PACK' end as VARCHAR(4))
    from task_duration
),
cte(id, task_id, starttime, endtime, rn, lvl, act) as (
    select d.id, d.task_id, d.starttime, d.endtime, d.rn, 1, 
    CAST(NULL AS VARCHAR(4))
    from task_duration_wrn d
    where act is NULL
    
    union all
    
    select d.id, d.task_id, c.starttime, d.endtime, d.rn, c.lvl 1, d.act
    from cte c
    join task_duration_wrn d on c.id = d.id and c.task_id = d.task_id and 
        c.lvl 1 = d.rn
    where d.act = 'PACK'    
)
select id, task_id, starttime, max(endtime) as endtime
from cte c
group by id, task_id, starttime
order by id, task_id, starttime
  • Related