Summarized punch shifts for overlapping clock outs and ins-CodePudding

I have a file that contains all of the time punch shifts for each employee like so:

employee	start_time	end_time
123	2022-10-23 10:40:00.000	2022-10-23 14:00:00.000
123	2022-10-23 14:00:00.000	2022-10-23 14:30:00.000
123	2022-10-23 14:35:00.000	2022-10-23 17:07:00.000
541	2022-10-23 06:50:00.000	2022-10-23 12:00:00.000
541	2022-10-23 13:00:00.000	2022-10-23 15:30:00.000

However, I'm wanting to return a summarized row for each employee IF the clock-ins and clock-outs fall within overlapping hours like so

employee	start_time	end_time
123	2022-10-23 10:40:00.000	2022-10-23 17:07:00.000
541	2022-10-23 06:50:00.000	2022-10-23 12:00:00.000
541	2022-10-23 13:00:00.000	2022-10-23 15:30:00.000

The reason why I want to do this is because I then am going to count employees that were part of a timeshift during a given period of time. Right now my query is counting duplicates because there are employees clocking out and clocking back in within the same hour.

This is my code for counting employees for each hour

WITH  Hours AS(
    SELECT 1 AS HOUR
    UNION ALL
    SELECT HOUR   1 FROM Hours WHERE HOUR < 24
)
SELECT 
   CAST(Start_Time AS DATE) [DATE], 
   HOUR, 
   COUNT(Emp_Int) AS [Head Count]
FROM 
   Hours
LEFT JOIN 
   TIME_SHIFTS_TABLE T on HOUR BETWEEN DATEPART(HOUR, START_TIME) AND DATEPART(HOUR, END_TIME)
GROUP BY 
   CAST(Start_Time AS DATE), HOUR
ORDER BY 
   CAST(Start_Time AS DATE), HOUR asc

This join works but is counting duplicates when an employee logs out and back in within the same hour.

CodePudding user response：

This is a gaps-and-islands problem. One way of dealing with this kind of task is to

first generate the partitions you need with a running sum,
then apply aggregation with MIN(start_time) and MAX(end_time).

You can craft a running sum by:

checking when the difference between current start time and previous end time is bigger than 59 minutes, with DATEDIFF, hence flag with 1 if that's the case
use a SUM window function over this flag, for each of your employee

Then you can just run the aggregation on your newly created partitioning.

WITH cte AS (
    SELECT *,
           CASE WHEN DATEDIFF(mi, LAG(end_time) OVER(PARTITION BY employee ORDER BY start_time), start_time) >= 60 
                THEN 1 ELSE 0 
           END AS change_partition
    FROM tab
), cte2 AS (
    SELECT *, SUM(change_partition) OVER(PARTITION BY employee ORDER BY start_time) AS partitions
    FROM cte
)
SELECT employee, MIN(start_time) AS start_time, MAX(end_time) AS end_time 
FROM cte2
GROUP BY employee, partitions

Check the demo here.