Home > database >  Merging consecutive dates for a group together
Merging consecutive dates for a group together

Time:05-10

I have a table with employees that looks like this:

Name Department Manager Date
Employee 1 Dept 1 Manager X 202101
Employee 1 Dept 1 Manager X 202102
Employee 1 Dept 2 Manager X 202103
Employee 1 Dept 2 Manager X 202104
Employee 1 Dept 1 Manager X 202105
Employee 1 Dept 1 Manager X 202106
Employee 2 Dept 1 Manager X 202101
Employee 2 Dept 1 Manager X 202102

I need to build a view that shows the data in the following format:

Name Department Manager Valid_From Valid_To
Employee 1 Dept 1 Manager X 202101 202102
Employee 1 Dept 2 Manager X 202103 202104
Employee 1 Dept 1 Manager X 202105 999912
Employee 2 Dept 1 Manager X 202101 999912

So far, this is what why code looks like:

WITH cte AS
(
   SELECT [Name], Department, Manager, Valid_From = min([Date]), Valid_To = max([Date]),
      RowNum = ROW_NUMBER() OVER (PARTITION BY [Name], ORDER BY max([Date]) DESC)
   FROM TestingTable
   WHERE ([Date] IS NOT NULL)
   GROUP BY [Name], Department, Manager
)
SELECT [Name], Department, Manager, Valid_From,
    CASE WHEN RowNum = 1 THEN 999912 ELSE Valid_To END AS Valid_To, CASE WHEN RowNum = 1 THEN 1 ELSE 0 END AS Is_Latest
FROM cte

The output is this - it groups the intervals that employee 1 has worked in department 1, while I need it in 2 different chronological intervals.

Name Department Manager Valid_From Valid_To
Employee 1 Dept 1 Manager X 202101 999912
Employee 1 Dept 2 Manager X 202103 202104
Employee 2 Dept 1 Manager X 202101 999912

I experimented a bit with the lag and lead functions to compare the dates, but I'm lost.

CodePudding user response:

Looks like you need to group consecutive year-months per employee-department-manager together. It could be done like so:

with cte1 as (
    select name
         , department
         , manager
         , datefromparts(date / 100, date % 100, 1) as yymm
    from t
), cte2 as (
    select *
         , case when lag(yymm) over (partition by name, department, manager order by yymm) = dateadd(month, -1, yymm) then 0 else 1 end as new_grp
    from cte1
), cte3 as (
    select *
         , sum(new_grp) over (partition by name, department, manager order by yymm) as grp_num
    from cte2
)
select name
     , department
     , manager
     , min(yymm) as valid_from
     , max(yymm) as valid_to
from cte3
group by name, department, manager, grp_num
order by name, valid_from, department, manager

Note that I had to convert the year-months to dates for easier comparison. Result:

name department manager valid_from valid_to
Employee 1 Dept 1 Manager X 2021-01-01 2021-02-01
Employee 1 Dept 2 Manager X 2021-03-01 2021-04-01
Employee 1 Dept 1 Manager X 2021-05-01 2021-06-01
Employee 2 Dept 1 Manager X 2021-01-01 2021-02-01

Replacing the last valid_to for each employee with 9999-12-01 is trivial e.g. you can check if lead(valid_from) over (partition by name order by valid_from) is null.

  • Related