I've been working on an issue for a few days now, and I can't seem to find the right fix. Does anybody have an idea?
Case
We want to create a new a new sequence number whenever an employee has resigned for more than 1 day. We have the delta of the current employment record and the previous, so we can check the sequence. We want to calculate the min(Start) and max(End) of each employment record which isn't separated more than 1 day apart.
Data
Employee | Contract | Unit | Start | End | Delta |
---|---|---|---|---|---|
John Doe | 1 | Unit A | 2014-01-01 | 2017-12-31 | NULL |
John Doe | 2 | Unit A | 2018-02-01 | 2018-12-31 | 31 |
John Doe | 3 | Unit B | 2019-01-01 | 2020-05-31 | 1 |
John Doe | 4 | Unit A | 2020-06-01 | NULL | 1 |
With the query it should give back:
Employee | Contract | Unit | Start | End | Delta | Sequence |
---|---|---|---|---|---|---|
John Doe | 1 | Unit A | 2014-01-01 | 2017-12-31 | NULL | 1 |
John Doe | 2 | Unit A | 2018-02-01 | 2018-12-31 | 31 | 2 |
John Doe | 3 | Unit B | 2019-01-01 | 2020-05-31 | 1 | 2 |
John Doe | 4 | Unit A | 2020-06-01 | NULL | 1 | 2 |
That is because sequence 1 end at 31-12-2017, and a new one starts in February of 2018, so there has been more than 1 day of separation between the records. The following all have a sequence of 2 because it is continuing.
Query
I've tried a few things already with lag() and lead(), but I keep working myself into a corner with the data sample that I have. When I run it on the full set it won't work.
SELECT
Employee,
Start,
End,
DeltaPrevious,
Delta,
DeltaNext,
case
when DeltaPrevious IS NULL AND Delta = 1 then 1
when DeltaPrevious = 1 AND Delta > 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
when DeltaPrevious > 1 AND Delta = 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
end as Sequence
FROM
Contracts
ORDER BY
Employee, Start ASC
Hope that someone has a great idea.
Thanks,
CodePudding user response:
If I understood correctly from the definition of Sequence
in your second table, you are more interested in the DeltaNext
than in the Delta(Previous)
. Here an attempt, including the code to create a sample input date with two more employees:
CREATE TABLE #input_table (Employee VARCHAR(255), [Contract] INT, Unit VARCHAR(6), [Start] DATE, [End] DATE)
INSERT INTO #input_table
VALUES
('John Doe', 1, 'Unit A', '2014-01-01', '2017-12-31'),
('John Doe', 2, 'Unit A', '2018-02-01', '2018-12-31'),
('John Doe', 3, 'Unit B', '2019-01-01', '2020-05-31'),
('John Doe', 4, 'Unit A', '2020-06-01', NULL),
('Alice', 1, 'Unit A', '2020-01-01', NULL),
('Bob', 1, 'Unit C', '2020-01-01', '2020-02-20')
First we create the deltas:
SELECT *
, DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee
ORDER BY [Start]), [Start]) -- Not relevant (?)
, DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
INTO #cte_delta -- I'll create a CTE at the end
FROM #input_table
Then we define Sequence
:
SELECT *
, [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
INTO #cte_sequence
FROM #cte_delta
We then group the same Sequence
s by assigning a unique ROW_NUMBER
for each employee with consecutive/ same Sequence
s:
SELECT *
, GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
INTO #cte_grp
FROM #cte_sequence
Finally we calculate the min
and max
of the contract duration:
SELECT *
, MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
, CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End])
OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd
FROM cte_grp
The COUNT(*)
and COUNT([End])
comparison is necessary or else the ContractEnd
would be the max non-NULL value, i.e. 2018-02-01
.
The whole code with CTE
s here:
WITH cte_delta AS (
SELECT *
, DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]), [Start]) -- Not relevant (?)
, DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
FROM #input_table
)
, cte_sequence AS (
SELECT *
, [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
FROM cte_delta
)
, cte_grp AS (
SELECT *
, GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
FROM cte_sequence
)
SELECT *
, MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
, CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End]) OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd
FROM cte_grp
Here the output:
Employee | Contract | Unit | Start | End | DeltaPrev | DeltaNext | Sequence | GRP | ContractStart | ContractEnd |
---|---|---|---|---|---|---|---|---|---|---|
Alice | 1 | Unit A | 2020-01-01 | NULL | NULL | NULL | 2 | 0 | 2020-01-01 | NULL |
Bob | 1 | Unit C | 2020-01-01 | 2020-02-20 | NULL | NULL | 2 | 0 | 2020-01-01 | 2020-02-20 |
John Doe | 1 | Unit A | 2014-01-01 | 2017-12-31 | NULL | 32 | 1 | 0 | 2014-01-01 | 2017-12-31 |
John Doe | 2 | Unit A | 2018-02-01 | 2018-12-31 | 32 | 1 | 2 | 1 | 2018-02-01 | NULL |
John Doe | 3 | Unit B | 2019-01-01 | 2020-05-31 | 1 | 1 | 2 | 1 | 2018-02-01 | NULL |
John Doe | 4 | Unit A | 2020-06-01 | NULL | 1 | NULL | 2 | 1 | 2018-02-01 | NULL |
Feel free to select DISTINCT
records according to your needs.
CodePudding user response:
Basically, you want to use lag()
to get the previous date and then do a cumulative sum. This looks like:
select c.*,
sum(case when prev_end >= dateadd(day, -1, start) then 0 else 1
end) over (partition by employee order by start) as ranking
from (select c.*,
lag(end) over (partition by employee order by start) as prev_end
from contracts c
) c;
You mention that you might want to recalculate the new start
and end
. You would just use the above as a subquery/CTE and aggregate on employee
and ranking
.