Home > Mobile >  SQL Consecutive Date Cumulative Count
SQL Consecutive Date Cumulative Count

Time:02-18

I am doing some roster analysis and need to identify when an employee has worked for 5 or more consecutive days. In my table, I can extract data something like the below (note, there are lot more columns, this is just a cut down example):

Emp Start First_Entry
1234 23/06/2016 1
1234 24/06/2016 1
1234 24/06/2016 0
1234 25/06/2016 1
1234 26/06/2016 1
1234 27/06/2016 1
1234 28/06/2016 1
1234 29/06/2016 1
1234 29/06/2016 0
1234 30/06/2016 1
1234 2/07/2016 1
1234 3/07/2016 1
1234 3/07/2016 0
1234 4/07/2016 1
1234 4/07/2016 0
1234 5/07/2016 1
1234 6/07/2016 1
1234 9/07/2016 1
1234 10/07/2016 1
1234 11/07/2016 1
1234 12/07/2016 1

And what I am after is something like this:

Emp Start First_Entry Consecutive_Days Over_5 Status
1234 23/06/2016 1 1 0 Worked < 5
1234 24/06/2016 1 2 0 Worked < 5
1234 24/06/2016 0 2 0 Worked < 5
1234 25/06/2016 1 3 0 Worked < 5
1234 26/06/2016 1 4 0 Worked < 5
1234 27/06/2016 1 5 1 Worked >= 5
1234 28/06/2016 1 6 1 Worked >= 5
1234 29/06/2016 1 7 1 Worked >= 5
1234 29/06/2016 0 7 1 Worked >= 5
1234 30/06/2016 1 8 1 Worked >= 5
1234 02/07/2016 1 1 0 Worked < 5
1234 03/07/2016 1 2 0 Worked < 5
1234 03/07/2016 0 2 0 Worked < 5
1234 04/07/2016 1 3 0 Worked < 5
1234 04/07/2016 0 3 0 Worked < 5
1234 05/07/2016 1 4 0 Worked < 5
1234 06/07/2016 1 5 1 Worked >= 5
1234 09/07/2016 1 1 0 Worked < 5
1234 10/07/2016 1 2 0 Worked < 5
1234 11/07/2016 1 3 0 Worked < 5
1234 12/07/2016 1 4 0 Worked < 5

I'm really not sure how to go about getting the cumulative count for consecutive days, so any help you can give will be amazing

CodePudding user response:

Probably someone would come up with a brilliant solution but this would do. Your problem looks like an "Gaps and Islands" problem. Finding islands of date ranges we can find out the rest easily. In the below SQL, @mindate is not a must, but makes it easier.

CREATE TABLE #temptable
(
    [Emp] CHAR(4),
    [startDate] DATE,
    [First_Entry] BIT
);
INSERT INTO #temptable
(
    [Emp],
    [startDate],
    [First_Entry]
)
VALUES
('1234', N'2016-06-23', 1),
('1234', N'2016-06-24', 1),
('1234', N'2016-06-24', 0),
('1234', N'2016-06-25', 1),
('1234', N'2016-06-26', 1),
('1234', N'2016-06-27', 1),
('1234', N'2016-06-28', 1),
('1234', N'2016-06-29', 1),
('1234', N'2016-06-29', 0),
('1234', N'2016-06-30', 1),
('1234', N'2016-07-02', 1),
('1234', N'2016-07-03', 1),
('1234', N'2016-07-03', 0),
('1234', N'2016-07-04', 1),
('1234', N'2016-07-04', 0),
('1234', N'2016-07-05', 1),
('1234', N'2016-07-06', 1),
('1234', N'2016-07-09', 1),
('1234', N'2016-07-10', 1),
('1234', N'2016-07-11', 1),
('1234', N'2016-07-12', 1);


DECLARE @minDate DATE;
SELECT @minDate = DATEADD(d, -1, MIN(startDate))
FROM #temptable;

WITH firstOnly
AS (SELECT *
    FROM #temptable
    WHERE First_Entry = 1),
     grouper (emp, startDate, grp)
AS (SELECT Emp,
           startDate,
           DATEDIFF(d, @minDate, startDate) - ROW_NUMBER() OVER (PARTITION BY Emp ORDER BY startDate)
    FROM firstOnly),
     islands (emp, START, [end])
AS (SELECT emp,
           MIN(startDate),
           MAX(startDate)
    FROM grouper
    GROUP BY emp,
             grp),
     consecutives (emp, startDate, consecutive_days)
AS (SELECT f.Emp,
           f.startDate,
           --       i.START,
           --       i.[end],
           ROW_NUMBER() OVER (PARTITION BY f.Emp, i.START ORDER BY i.START)
    FROM firstOnly f
        INNER JOIN islands i
            ON f.startDate
               BETWEEN i.START AND i.[end])
SELECT t.Emp,
       t.startDate,
       t.First_Entry,
       c.consecutive_days,
       CAST(CASE
                WHEN c.consecutive_days < 5 THEN
                    0
                ELSE
                    1
            END AS BIT) Over_5,
       CASE
           WHEN c.consecutive_days < 5 THEN
               'Worked < 5'
           ELSE
               'Worked >= 5'
       END [Status]
FROM consecutives c
    INNER JOIN #temptable t
        ON t.Emp = c.emp
           AND t.startDate = c.startDate;

DROP TABLE #temptable;

CodePudding user response:

This is a island and gap problem, You can try to use LAG window function to get the previous startDate row for each Emp, ten use SUM window function to calculate which days are continuous.

Finally, We can use CASE WHEN expression to judge whether the day is greater than 5.

;WITH CTE AS (
    SELECT  [Emp],
        [startDate],
        [First_Entry],
        SUM(CASE WHEN DATEDIFF(dd,f_Dt,startDate) <= 1 THEN 0 ELSE 1 END) OVER(PARTITION BY Emp ORDER BY startDate) grp 
    FROM (
      SELECT *,
           LAG(startDate,1,startDate) OVER(PARTITION BY Emp ORDER BY startDate) f_Dt
      FROM T
    ) t1
)
SELECT  [Emp],
    [startDate],
    [First_Entry],
    SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) Consecutive_Days,
    (CASE WHEN SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) >= 5 THEN 1 ELSE 0 END) Over_5,
     (CASE WHEN SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) >= 5 THEN 'Worked >= 5' ELSE 'Worked < 5' END) Status
FROM CTE 

sqlfiddle

  • Related