Long time lurker, first time asking a question. If I make any etiquette mistakes, please correct me.
Goal: Get the time an individual started and ended a position based on their job code and team code. So if someone left a position, then later returned to the position I would like it to give me two "Start Dates" in that position.
Data Table / Format: My data table includes a weekly snapshot taken every Sunday of every employee, their employeeID, their job code, their team code, and then other stuff. Here's a (fake) example. Variable types in ().
EID JobCode TeamCode weekstartdate
(INT) (VARCHAR) (INT) (DATETIME)
1 INT111 142 2022-02-20 00:00:00.00
1 INT111 142 2022-02-27 00:00:00.00
I have tried something along the lines of
Select EID,Jobcode,teamcode,MIN(weekstartdate)
from #Dataset
group by EID,Jobcode,Teamcode
The struggle is that if someone has left and returned to the job, it would only show their original weekstartdate. Where I need only uninterrupted time in the position.
If anyone can point me towards functions or features I don't know about that could help with this, I would appreciate it.
CodePudding user response:
This seems to be a type of gaps and islands problem - with some assumptions that you should validate (a row for Sunday only).
Following is one example that is a bit overcomplicated but I started with pre-existing code. ROW_NUMBER is not required here but I use/include it for debugging and learning purposes. You use LAG to get the previous start date in each row and then simply test whether that value is seven days prior to the start date column. If it is, then the row is a "continuation" of the prior row.
with cte as (
select *,
row_number() over (partition by EID, JobCode, TeamCode order by weekstartdate) as rno,
lag (weekstartdate) over (partition by EID, JobCode, TeamCode order by weekstartdate) as prevstart
from empdata
),
blocks as (
select *,
case when dateadd(day, 7, prevstart) = weekstartdate then 0 else 1 end as islandstart
from cte
)
select * from blocks
where islandstart = 1
order by EID, JobCode, TeamCode, rno
;
Remove the WHERE clause to see what values are generated for all rows to help understand how it works. fiddle to demonstrate.
CodePudding user response:
I'm sure there's a much simpler way to do this, but food for thought in any case
http://sqlfiddle.com/#!18/05e76/35
create table your_table
( eid varchar(10),
jobCode varchar(10),
teamCode integer,
weekstartdate date
);
insert into your_table values
('1','INT111',142,'2020-01-01'),
('1','INT111',142,'2020-01-29'),
('2','INT111',142,'2020-01-01'),
('2','INT111',142,'2020-01-08'),
('2','INT111',142,'2020-01-15'),
('2','INT111',142,'2020-01-22'),
('2','INT111',142,'2020-01-29')
with all_week_starts as
( select distinct
weekstartdate
from your_table
),
all_combinations as
( select distinct
eid,
jobCode,
teamCode,
aws.weekstartdate
from your_table
cross
join all_week_starts aws
)
select eid,
jobCode,
TeamCode,
weekstartdate as first_week_employed
from (
select eid,
jobCode,
TeamCode,
employed_this_week,
lag(employed_this_week) over (partition by
eid,
jobCode,
TeamCode
order by weekstartdate) as employed_last_week,
weekstartdate
from (
select ac.eid,
ac.jobCode,
ac.TeamCode,
ac.weekstartdate,
case when yt.weekstartdate is not null then 1 else 0 end as employed_this_week
from all_combinations ac
left
join your_table yt
on ac.weekstartdate = yt.weekstartdate
and ac.eid = yt.eid
and ac.jobCode = yt.jobCode
and ac.teamCode = yt.teamCode
) as flagged_employment_gaps
) as tmp2
where employed_this_week = 1
and coalesce(employed_last_week,0) = 0
order
by eid,
jobCode,
teamCode,
first_week_employed