Home > Software engineering >  Find consecutive number of days where a column contains a value
Find consecutive number of days where a column contains a value

Time:10-21

Let's imagine we have a list of employees, where each day we track what their task_list was. We can assign a boolean if that list includes something we call 'critical', in this case let's assume the person asked to lock the door has the 'critical' task.

For this additional responsibility, we want to reward any of the employees who have demonstrated that they are competent enough to manage that task for a given number of days in a row.

I've got this list of critical tasks, and am able to flag them successfully day by day, but am having trouble figuring out how to get our counter applied correctly.

Ideally, it increments for each consecutive day where we find one of the critical tasks within their task_list, and resets to 0 on a day where they do not have a critical task.

Here's an example desired output:

Employee Day Task_List Critical_Task Consec_Days_Crit_Task
Tom J 10/1/22 Sweep, Lock Door* 1 1
Tom J 10/2/22 Sweep, Lock Door* 1 2
Tom J 10/3/22 Mop, Dishes 0 0
Tom J 10/4/22 Sweep, Lock Door* 1 1
Sue B 10/1/22 Mop, Dishes 0 0
Sue B 10/2/22 Mop, Dishes 0 0
Sue B 10/3/22 Sweep, Lock Door* 1 1
Sue B 10/4/22 Mop, Dishes 0 0

I'm able to get the first 4 columns into a temp table no problem, I've tried using a loop to update those counter values as well as tried to use something like the lag function. Just can't seem to wrap my head around how to write the partition statement I guess.

Any advice?

CodePudding user response:

This is a gaps and islands problem, for each employee, you need to identify groups of consecutive rows where Critical_Task=1 (islands) and groups of consecutive rows where Critical_Task=0 (gaps). After that, you can use the count window function to count the consecutive rows where Critical_Task=1 for each employee within the defined groups.

SELECT Employee, Day, Task_List, Critical_Task, 
       COUNT(CASE WHEN Critical_Task=1 THEN Task_List END) OVER 
       (PARTITION BY Employee, grp ORDER BY Day) Consec_Days_Crit_Task
FROM
(
  SELECT *,
     ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Day) -
     ROW_NUMBER() OVER (PARTITION BY Employee,Critical_Task ORDER BY Day) AS grp
  FROM Table_Name
) T
ORDER BY Employee DESC, Day

ROW_NUMBER() OVER ... AS grp is used to identify the required groups.

See a demo.

CodePudding user response:

We can left join back on the same data to identify the maximum date where a non-critical task bounds the current record. It's then just a simple matter of using DATEDIFF() functions to calculate the total number of consecutive days since the last 0.

I have used your sample data from the 1st four columns and inserted them into a table variable to be used in the examples below:

DECLARE @Data as TABLE (
  Employee varchar(50), 
  Day date, 
  Task_List varchar(50), 
  Critical_Task bit)

INSERT INTO @Data
VALUES ('Tom J', '10/1/22', 'Sweep, Lock Door*', 1)
  ,('Tom J', '10/2/22', 'Sweep, Lock Door*', 1)
  ,('Tom J', '10/3/22', 'Mop, Dishes', 0)
  ,('Tom J', '10/4/22', 'Sweep, Lock Door*', 1)
  ,('Sue B', '10/1/22', 'Mop, Dishes', 0)
  ,('Sue B', '10/2/22', 'Mop, Dishes', 0)
  ,('Sue B', '10/3/22', 'Sweep, Lock Door*', 1)
  ,('Sue B', '10/4/22', 'Mop, Dishes', 0);

SELECT Employee, Day, Task_List, Critical_Task
  , DATEDIFF(day, last_0, Day) as Consec_Days_Crit_Task
FROM (
    SELECT a.*
        , coalesce(max(b.Day), 
            dateadd(day, -1, min(a.Day) over(partition by a.employee))) as last_0
    FROM @Data as a
    LEFT JOIN @Data as b
        ON a.Employee = b.Employee AND b.Critical_Task = 0 AND a.Day >= b.Day
    GROUP BY a.Employee, a.Day, a.Task_List, a.Critical_Task
) as a;
  • Related