Example Situation: An order system tracks manually entered due dates by recording a system log date that is always unique (this would be a datetime, but I've used dates for simplicity).
I would like to assign a group or section to each due date grouping that exists continuous, chronologically, without changing the due date.
For example, if April 10 is entered as due date, later changed to April 15, and then changed back to April 10 again, this would be 3 distinct groups/sections. In a more complex model, the sectioning would allow deeper investigation as to why due date changes were entered.
Here is a sample table that has 2 orders, each with a similar set of due date changes.
CREATE TABLE #DueDates (OrderNo INT, DueDate Date, SysLogDate Date)
INSERT INTO #DueDates Values (1, '4/10/2022', '1/10/2022')
,(1, '4/10/2022', '1/11/2022')
,(1, '4/15/2022', '1/15/2022')
,(1, '4/15/2022', '1/16/2022')
,(1, '4/15/2022', '1/17/2022')
,(1, '4/10/2022', '1/18/2022')
,(1, '4/10/2022', '1/19/2022')
,(1, '4/10/2022', '1/20/2022')
,(2, '4/10/2022', '2/16/2022')
,(2, '4/10/2022', '2/17/2022')
,(2, '4/15/2022', '2/18/2022')
,(2, '4/15/2022', '2/20/2022')
,(2, '4/15/2022', '2/21/2022')
,(2, '4/10/2022', '2/22/2022')
,(2, '4/10/2022', '2/24/2022')
,(2, '4/10/2022', '2/26/2022')
The script should be able to assign a section/group to the due dates of both orders at the same time, as follows:
OrderNo DueDate SysLogDate SectionNumber_WithinDueDate
1 2022-04-10 2022-01-10 1
1 2022-04-10 2022-01-11 1
1 2022-04-15 2022-01-15 2
1 2022-04-15 2022-01-16 2
1 2022-04-15 2022-01-17 2
1 2022-04-10 2022-01-18 3
1 2022-04-10 2022-01-19 3
1 2022-04-10 2022-01-20 3
2 2022-04-10 2022-02-16 1
2 2022-04-10 2022-02-17 1
2 2022-04-15 2022-02-18 2
2 2022-04-15 2022-02-20 2
2 2022-04-15 2022-02-21 2
2 2022-04-10 2022-02-22 3
2 2022-04-10 2022-02-24 3
2 2022-04-10 2022-02-26 3
Below is my initial attempt, using Dense_Rank():
Select *, Dense_Rank() OVER (Partition By OrderNo, DueDate Order By SysLogDate) as SectionNumber_WithinDueDate
From #DueDates
However, it groups all 4/10 due dates together within each order, instead of grouping them distinctly as two separate groups as above.
Incorrect result example:
OrderNo DueDate SysLogDate SectionNumber_WithinDueDate
1 2022-04-10 2022-01-10 1
1 2022-04-10 2022-01-11 2
1 2022-04-10 2022-01-18 3
1 2022-04-10 2022-01-19 4
1 2022-04-10 2022-01-20 5
1 2022-04-15 2022-01-15 1
1 2022-04-15 2022-01-16 2
1 2022-04-15 2022-01-17 3
Thank you for any ideas you may have!
CodePudding user response:
Try the following:
Select B.OrderNo, B.DueDate, B.SysLogDate,
DENSE_RANK() Over (Partition By OrderNo Order By B.grp) SectionNumber_WithinDueDate
From
(
Select D.OrderNo, D.DueDate, D.SysLogDate,
SUM(D.g) Over (Partition By D.OrderNo Order By D.SysLogDate) grp
From(
Select OrderNo, DueDate, SysLogDate,
ABS(DATEDIFF(Day, DueDate, ISNULL(LAG(DueDate) Over (Partition By OrderNo Order By SysLogDate), DueDate))) g
From #DueDates
) D
) B
See a demo from db<>fiddle.
The DATEDIFF
is used to check the difference between the current row 'DueDate' and the previous one (using LAG
function), once there is a difference a new group edge is created.
The cumulative sum SUM(g) Over...
is to define the groups from the group edges.
CodePudding user response:
This essentially requires a gaps and islands solution, which typically is done by subtracting a partitioned sequence from a global sequence.
If you don't actually need the section number to be numbered sequentially and just really need to identify the separate groups you can disregard the outer select below.
select OrderNo, DueDate, SysLogDate,
dense_rank() over(partition by orderno order by gp) SectionNumber_WithinDueDate
from (
select *,
Row_Number() over(partition by OrderNo order by SysLogDate)
- Row_Number() over(partition by OrderNo, DueDate order by SysLogDate) gp
from #DueDates
)t
order by OrderNo, SysLogDate;
Demo Fiddle