I have a file that contains all of the time punch shifts for each employee like so:
employee | start_time | end_time |
---|---|---|
123 | 2022-10-23 10:40:00.000 | 2022-10-23 14:00:00.000 |
123 | 2022-10-23 14:00:00.000 | 2022-10-23 14:30:00.000 |
123 | 2022-10-23 14:35:00.000 | 2022-10-23 17:07:00.000 |
541 | 2022-10-23 06:50:00.000 | 2022-10-23 12:00:00.000 |
541 | 2022-10-23 13:00:00.000 | 2022-10-23 15:30:00.000 |
However, I'm wanting to return a summarized row for each employee IF the clock-ins and clock-outs fall within overlapping hours like so
employee | start_time | end_time |
---|---|---|
123 | 2022-10-23 10:40:00.000 | 2022-10-23 17:07:00.000 |
541 | 2022-10-23 06:50:00.000 | 2022-10-23 12:00:00.000 |
541 | 2022-10-23 13:00:00.000 | 2022-10-23 15:30:00.000 |
The reason why I want to do this is because I then am going to count employees that were part of a timeshift during a given period of time. Right now my query is counting duplicates because there are employees clocking out and clocking back in within the same hour.
This is my code for counting employees for each hour
WITH Hours AS(
SELECT 1 AS HOUR
UNION ALL
SELECT HOUR 1 FROM Hours WHERE HOUR < 24
)
SELECT
CAST(Start_Time AS DATE) [DATE],
HOUR,
COUNT(Emp_Int) AS [Head Count]
FROM
Hours
LEFT JOIN
TIME_SHIFTS_TABLE T on HOUR BETWEEN DATEPART(HOUR, START_TIME) AND DATEPART(HOUR, END_TIME)
GROUP BY
CAST(Start_Time AS DATE), HOUR
ORDER BY
CAST(Start_Time AS DATE), HOUR asc
This join works but is counting duplicates when an employee logs out and back in within the same hour.
CodePudding user response:
This is a gaps-and-islands problem. One way of dealing with this kind of task is to
- first generate the partitions you need with a running sum,
- then apply aggregation with
MIN(start_time)
andMAX(end_time)
.
You can craft a running sum by:
- checking when the difference between current start time and previous end time is bigger than 59 minutes, with
DATEDIFF
, hence flag with 1 if that's the case - use a
SUM
window function over this flag, for each of your employee
Then you can just run the aggregation on your newly created partitioning.
WITH cte AS (
SELECT *,
CASE WHEN DATEDIFF(mi, LAG(end_time) OVER(PARTITION BY employee ORDER BY start_time), start_time) >= 60
THEN 1 ELSE 0
END AS change_partition
FROM tab
), cte2 AS (
SELECT *, SUM(change_partition) OVER(PARTITION BY employee ORDER BY start_time) AS partitions
FROM cte
)
SELECT employee, MIN(start_time) AS start_time, MAX(end_time) AS end_time
FROM cte2
GROUP BY employee, partitions
Check the demo here.