Home > front end >  Date span over tables
Date span over tables

Time:08-02

I want to show all profit centers a user had. The necessary information are in two tables.

  1. TerritoryAssignment
 -------------0---------- ------------ ------------ -------------- 
| TerritoryID | DBUserID | ValidFrom  | ValidThru  | AssignmentID |
 ------------- ---------- ------------ ------------ -------------- 
| T1          | 472      | 2019-03-01 | 2019-12-31 | 1389         |
| T4          | 472      | 2020-01-01 | 2020-10-31 | 2105         |
| T8          | 472      | 2020-11-01 | 2021-09-12 | 2226         |
| T12         | 472      | 2021-09-13 | 2021-11-30 | 2578         |
| T2          | 472      | 2021-12-01 | 9999-12-31 | 2659         |
 ------------- ---------- ------------ ------------ -------------- 
  1. TerritoryDetails
 ----------- -------------- ------------ ------------ -------------------- 
TerritoryID | ProfitCenter | ValidFrom  | ValidThru  | TerritoryDetailsID |
 ----------- -------------- ------------ ------------ -------------------- 
| T2        | P05          | 2021-12-01 | 2022-04-30 | 983                |
| T2        | P18          | 2022-05-01 | 9999-12-31 | 1029               |
| T1        | P45          | 2012-09-01 | 9999-12-31 | 502                |
| T4        | P23          | 2020-01-01 | 9999-12-31 | 755                |
| T12       | P05          | 2020-01-01 | 9999-12-31 | 846                |
| T8        | P18          | 2020-01-01 | 9999-12-31 | 956                |
 ----------- -------------- ------------ ------------ -------------------- 

Both tables are joined over the field TerritoryID.

As you can see, the user hat profit center P18 at two different time periods. Therefore a simple MIN and MAX with grouping is not possible. Also see that there are two records for profit center P05 which should be aggregated into one.

I want to get all Profit Centers a user had in the past including the exact start date and end date. The output could look like this:

 ---------- ------------ ------------ -------------- 
| DBUserID | ValidFrom  | ValidThru  | ProfitCenter |
 ---------- ------------ ------------ -------------- 
| 472      | 2019-03-01 | 2019-12-31 | P45          |
| 472      | 2020-01-01 | 2020-10-31 | P23          |
| 472      | 2020-11-01 | 2021-09-12 | P18          |
| 472      | 2021-09-13 | 2022-04-30 | P05          |
| 472      | 2022-05-01 | 9999-12-31 | P18          |
 ---------- ------------ ------------ -------------- 

The problem is:

  1. A user can have different territory assignments over the time -> multiple entries in table TerritoryAssignment for a user.
  2. Territory details can change -> multiple entries in table TerritoryDetails for a territory.
  3. The profit center can be the same over several TerritoryDatails records.
  4. The profit canter can be appear at another territory as well.
  5. The start and end dates in the records of both tables are independent and therefore cannot used for a join.

I already tried some CTE with ROW_NUMBER() but was not successful. Here my last try, not giving the correct result:

SELECT
   DBUserID,
   ProfitCenter, 
   MIN(ValidFrom) AS IslandStartDate,
   MAX(ValidThru) AS IslandEndDate
FROM
    (
     SELECT
        *,
        CASE WHEN Groups.PreviousEndDate >= ValidFrom THEN 0 ELSE 1 END AS IslandStartInd,
        SUM(CASE WHEN Groups.PreviousEndDate >= ValidFrom THEN 0 ELSE 1 END) OVER (ORDER BY Groups.RN) AS IslandId
     FROM
    (
    SELECT
       ROW_NUMBER() OVER(ORDER BY DBUserID,ProfitCenter,TD.ValidFrom,TD.ValidThru) AS RN,
       DBUserID,
       ProfitCenter,
       IIF(TA.ValidFrom<TD.ValidFrom,TD.ValidFrom,TA.ValidFrom) AS ValidFrom,
       IIF(TA.ValidThru>TD.ValidThru,TD.ValidThru,TA.ValidThru) AS ValidThru,
       LAG(TD.ValidFrom,1) OVER (ORDER BY DBUserID,ProfitCenter,TD.ValidFrom,TD.ValidThru) AS PreviousEndDate
    FROM
       dbo.TerritoryDetails TD INNER JOIN dbo.TerritoryAssignment TA ON TD.TerritoryID=TA.TerritoryID
    WHERE TA.DBUserID=472
       AND (
           (TD.ValidFrom<=TA.ValidFrom AND TD.ValidThru>=TA.ValidFrom)
              OR (TD.ValidFrom>=TA.ValidFrom AND TD.ValidThru<=TA.ValidThru)
              OR (TD.ValidFrom<=TA.ValidThru AND TD.ValidThru>=TA.ValidThru)
           )
    ) Groups
) Islands
GROUP BY
   IslandId,DBUserID,ProfitCenter
ORDER BY 
   IslandStartDate desc

Does anybody can help?

CodePudding user response:

The first problem here is finding intervals of time which overlap.

This actually turns out to be a fairly trivial expression. Given two intervals T1 and T2, they overlap if:

T1.start <= T2.end and T1.end >= T2.start

Here is a good explanation of why that works

The next problem is that once we have constructed the overlapped intervals, those intervals may be consecutive, ie, one interval starts the day after a prior interval ends. You want to "concatenate" those intervals. An example of this can be seen in your sample data for P05, where the following intervals exist:

2021-09-13 to 2021-11-30
2021-12-01 to 2022-04-30

And your desired result shows the total interval 2021-09-13 to 2022-04-30

We can use some running-sum "magic" to find consecutive intervals and merge them together.

Here is the complete logic broken up into distinct CTEs to make it easier to understand. Comments inline to explain the logic.

If you insert a select for each CTE in turn and look at the output up to that point it may help with understanding the overall logic.

create table #territoryassignment
(
   TerritoryID char(3), 
   DBUserID int,
   ValidFrom date,
   ValidThru date,
   AssignmentID int
);

create table #territoryDetails
(
   TerritoryID char(3),
   ProfitCenter char(3),
   ValidFrom date,
   ValidThru date,
   TerritoryDetailsID int
);

insert #territoryassignment values
('T1 ', 472, '2019-03-01', '2019-12-31', 1389),
('T4 ', 472, '2020-01-01', '2020-10-31', 2105),
('T8 ', 472, '2020-11-01', '2021-09-12', 2226),
('T12', 472, '2021-09-13', '2021-11-30', 2578),
('T2 ', 472, '2021-12-01', '9999-12-31', 2659)

insert #territoryDetails values
('T2 ', 'P05', '2021-12-01', '2022-04-30', 983 ),
('T2 ', 'P18', '2022-05-01', '9999-12-31', 1029),
('T1 ', 'P45', '2012-09-01', '9999-12-31', 502 ),
('T4 ', 'P23', '2020-01-01', '9999-12-31', 755 ),
('T12', 'P05', '2020-01-01', '9999-12-31', 846 ),
('T8 ', 'P18', '2020-01-01', '9999-12-31', 956 );

-- Find overlapping intervals.
-- An overlap exists if T1.start <= T2.end and T1.end >= T2.start
-- so include that predicate in the join.
-- Where an overlap is found, construct a new interval consisting of the overlapped section
with intervals as
(
   select   ta.DBUserID,
            td.ProfitCenter,
            ValidFrom = iif(td.ValidFrom > ta.ValidFrom, td.ValidFrom, ta.ValidFrom),
            ValidThru = iif(td.ValidThru < ta.ValidThru, td.ValidThru, ta.ValidThru)
   from     #territoryAssignment ta
   join     #territoryDetails    td on td.territoryID = ta.territoryId
                                       and td.validFrom <= ta.ValidThru
                                       and td.ValidThru >= ta.ValidFrom
),

-- For each interval, 
-- if the prior interval ended yesterday
-- then it is not gapped, otherwise it is.
gaps as
(
   select   DbUserId,
            ProfitCenter,
            ValidFrom,
            ValidThru,
            gap = iif
                  (
                     lag(ValidThru) over 
                     (
                        partition by DBUserId, profitCenter
                        order by ValidFrom asc
                     ) = dateadd(day, -1, ValidFrom),
                     0, 1
                  )
   from     intervals
),
-- gapCount is the running total number of gaps seen 
-- for a given user and profit center
-- in order of the interval start dates.
-- Consecutive intervals will have the same gapCount
-- non consecutive intervals will have an ever increasing gapCount.
grouped as
(
   select   DBUserId,
            ProfitCenter,
            ValidFrom,
            ValidThru,
            gapCount = sum(gap) over 
                       (
                          partition by DBUserId, ProfitCenter 
                          order by ValidFrom 
                          rows unbounded preceding
                       )
   from     gaps
)
-- "Merge" consecutive intervals by grouping on the gapCount
select   DBUserId,
         ProfitCenter,
         ValidFrom = min(ValidFrom),
         ValidThru = max(ValidThru)
from     grouped
group by DBUserId,
         ProfitCenter,
         gapCount
order by DBUserId,
         ProfitCenter,
         ValidFrom;
  • Related