Home > Software engineering >  Group manually entered date values depending on whether they are continuously the same over system l
Group manually entered date values depending on whether they are continuously the same over system l

Time:11-11

This is a follow-up question to my initial post

Example Situation: An order system tracks manually entered due dates by recording a system log date that is always unique (this would be a datetime, but I've used dates for simplicity, making each unique).

I would like to assign a section number to each due date grouping where the due date remains the same chronologically.

Stu's response solved the table in my initial post, but I've noticed that if I replace the 4/15/2022 due date associated with SysLogDate of 1/16/2022 to be 4/13/2022, the desired ordering does not seem to be maintained:

Note: 4/13/2022 date is an arbitrary change. The same issue occurs if I use any other unique date that is not yet already in the DueDate column. Ultimately, I also need to be able to handle changes to/from NULL, where someone 'forgets' to enter the date, but replacing the date with NULL also yields the same issue.

Updated Table:

CREATE TABLE #DueDates (OrderNo INT, DueDate Date, SysLogDate Date)
INSERT INTO #DueDates Values (1, '4/10/2022',   '1/10/2022')
                            ,(1, '4/10/2022',   '1/11/2022')
                            ,(1, '4/15/2022',   '1/15/2022')
                            ,(1, '4/13/2022',   '1/16/2022')  -- Due Date Altered since prior post
                            ,(1, '4/15/2022',   '1/17/2022')
                            ,(1, '4/10/2022',   '1/18/2022')
                            ,(1, '4/10/2022',   '1/19/2022')
                            ,(1, '4/10/2022',   '1/20/2022')
                            ,(2, '4/10/2022',   '2/16/2022')
                            ,(2, '4/10/2022',   '2/17/2022')
                            ,(2, '4/15/2022',   '2/18/2022')
                            ,(2, '4/15/2022',   '2/20/2022')
                            ,(2, '4/15/2022',   '2/21/2022')
                            ,(2, '4/10/2022',   '2/22/2022')
                            ,(2, '4/10/2022',   '2/24/2022')
                            ,(2, '4/10/2022',   '2/26/2022') 

Desired Results Are:

    OrderNo DueDate         SysLogDate      SectionNumber_WithinDueDate
    1       2022-04-10      2022-01-10      1
    1       2022-04-10      2022-01-11      1
    1       2022-04-15      2022-01-15      2
    1       2022-04-13      2022-01-16      3
    1       2022-04-15      2022-01-17      4
    1       2022-04-10      2022-01-18      5
    1       2022-04-10      2022-01-19      5
    1       2022-04-10      2022-01-20      5
    2       2022-04-10      2022-02-16      1
    2       2022-04-10      2022-02-17      1
    2       2022-04-15      2022-02-18      2
    2       2022-04-15      2022-02-20      2
    2       2022-04-15      2022-02-21      2
    2       2022-04-10      2022-02-22      3
    2       2022-04-10      2022-02-24      3
    2       2022-04-10      2022-02-26      3

...but applying the solution from my prior post to this updated table yields:

    OrderNo DueDate         SysLogDate      SectionNumber_WithinDueDate
    1       2022-04-10      2022-01-10      1
    1       2022-04-10      2022-01-11      1
    1       2022-04-15      2022-01-15      2
    1       2022-04-13      2022-01-16      3 **
    1       2022-04-15      2022-01-17      3 **
    1       2022-04-10      2022-01-18      3 **
    1       2022-04-10      2022-01-19      3 **
    1       2022-04-10      2022-01-20      3 **
    2       2022-04-10      2022-02-16      1
    2       2022-04-10      2022-02-17      1
    2       2022-04-15      2022-02-18      2
    2       2022-04-15      2022-02-20      2
    2       2022-04-15      2022-02-21      2
    2       2022-04-10      2022-02-22      3
    2       2022-04-10      2022-02-24      3
    2       2022-04-10      2022-02-26      3

Here's a demo to work that uses the above updated table and the solution from my prior post, and shows the above non-desired results: Fiddle

Demo showing same effect when the date is replaced with NULL: Fiddle with NULL

Copy of the selected solution from my prior post (used in the above Fiddles):

select OrderNo, DueDate, SysLogDate, 
      dense_rank() over(partition by orderno order by gp) SectionNumber_WithinDueDate
    from (
      select *,
        Row_Number() over(partition by OrderNo order by SysLogDate)
        - Row_Number() over(partition by OrderNo, DueDate order by SysLogDate) gp
      from #DueDates
    )t
    order by OrderNo, SysLogDate;

It's a small change in the data, but I haven't been able to work out how to alter the 'Row_Number difference line' in the subquery to get the desired results.

Thank you for any advice you can offer here :)

CodePudding user response:

For gap and island problem, I prefer to use lag() window function as it is easier to understand.

Use lag() to compare previous row value and when changed, set a flag (value 1). Perform a cumulative sum on the flag and you get the grp. Use dense_rank() on the grp and it gives you your SectionNumber_WithinDueDate

As you have NULL value, use ISNULL() to return a date value (99991231) for comparison

select OrderNo, DueDate, SysLogDate, 
       SectionNumber_WithinDueDate = dense_rank() over (partition by OrderNo 
                                                            order by grp)
from
(
    select *, grp = sum(g) over (partition by OrderNo 
                                     order by SysLogDate)
    from   
    ( 
           select *,
                  g = case when isnull(DueDate, '99991231')
                           <>   isnull(lag(DueDate) over (partition by OrderNo 
                                                              order by SysLogDate), '99991231')
                           then 1
                           else 0
                           end
           from   #DueDates
   ) d
) d
order by OrderNo, SysLogDate;

Fiddle on your sample data :

fiddle 1

fiddle 2

  • Related