SQL How to get the first date from an uninterrupted list. Ie, if there is a gap, I want two first da-CodePudding

Long time lurker, first time asking a question. If I make any etiquette mistakes, please correct me.

Goal: Get the time an individual started and ended a position based on their job code and team code. So if someone left a position, then later returned to the position I would like it to give me two "Start Dates" in that position.

Data Table / Format: My data table includes a weekly snapshot taken every Sunday of every employee, their employeeID, their job code, their team code, and then other stuff. Here's a (fake) example. Variable types in ().

 EID    JobCode   TeamCode   weekstartdate
(INT)  (VARCHAR)   (INT)      (DATETIME)
1     INT111      142        2022-02-20 00:00:00.00
1     INT111      142        2022-02-27 00:00:00.00

I have tried something along the lines of

Select EID,Jobcode,teamcode,MIN(weekstartdate)
from #Dataset
group by EID,Jobcode,Teamcode

The struggle is that if someone has left and returned to the job, it would only show their original weekstartdate. Where I need only uninterrupted time in the position.

If anyone can point me towards functions or features I don't know about that could help with this, I would appreciate it.

CodePudding user response：

This seems to be a type of gaps and islands problem - with some assumptions that you should validate (a row for Sunday only).

Following is one example that is a bit overcomplicated but I started with pre-existing code. ROW_NUMBER is not required here but I use/include it for debugging and learning purposes. You use LAG to get the previous start date in each row and then simply test whether that value is seven days prior to the start date column. If it is, then the row is a "continuation" of the prior row.

with cte as (
   select *, 
         row_number() over (partition by EID, JobCode, TeamCode order by weekstartdate) as rno, 
         lag (weekstartdate) over (partition by EID, JobCode, TeamCode order by weekstartdate) as prevstart 
   from empdata
), 
blocks as (
   select *, 
          case when dateadd(day, 7, prevstart) = weekstartdate then 0 else 1 end as islandstart 
   from cte
)
select * from blocks 
where islandstart = 1 
order by EID, JobCode, TeamCode, rno
;

Remove the WHERE clause to see what values are generated for all rows to help understand how it works. fiddle to demonstrate.

CodePudding user response：

I'm sure there's a much simpler way to do this, but food for thought in any case

http://sqlfiddle.com/#!18/05e76/35

create table your_table 
  ( eid varchar(10),
    jobCode varchar(10),
    teamCode integer,
    weekstartdate date
  );

insert into your_table values 
  ('1','INT111',142,'2020-01-01'),
  ('1','INT111',142,'2020-01-29'), 
  ('2','INT111',142,'2020-01-01'),
  ('2','INT111',142,'2020-01-08'),
  ('2','INT111',142,'2020-01-15'),
  ('2','INT111',142,'2020-01-22'),
  ('2','INT111',142,'2020-01-29')

with all_week_starts as
  ( select distinct 
           weekstartdate
      from your_table
  ),
  all_combinations as 
  ( select distinct 
           eid,
           jobCode,
           teamCode,
           aws.weekstartdate
      from your_table
     cross
      join all_week_starts aws
  )
select eid,
       jobCode,
       TeamCode,
       weekstartdate as first_week_employed
  from (
        select eid,
               jobCode,
               TeamCode,
               employed_this_week,
               lag(employed_this_week) over (partition by 
                             eid,
                             jobCode,
                             TeamCode
                   order by weekstartdate) as employed_last_week,
               weekstartdate
          from (
                select ac.eid,
                       ac.jobCode,
                       ac.TeamCode,
                       ac.weekstartdate,
                       case when yt.weekstartdate is not null then 1 else 0 end as employed_this_week
                  from all_combinations ac
                  left
                  join your_table yt
                    on ac.weekstartdate = yt.weekstartdate
                   and ac.eid = yt.eid
                   and ac.jobCode = yt.jobCode
                   and ac.teamCode = yt.teamCode
               ) as flagged_employment_gaps
        ) as tmp2
 where employed_this_week = 1
   and coalesce(employed_last_week,0) = 0
order
   by eid,
      jobCode,
      teamCode,
      first_week_employed