Home > Software design >  Start Date end date calculation based on date column
Start Date end date calculation based on date column

Time:12-23

I'm trying to calculate StartDate and EndDate based on a date column from a table. Below is the source table looks like

Scenario 1

ID SERIAL_NUMBER STATUS READ_DT
123456789 42007 D 15-12-2021
123456789 42007 D 16-12-2021
123456789 42007 D 17-12-2021
123456789 42007 D 18-12-2021
123456789 42007 D 19-12-2021
123456789 42007 D 20-12-2021
123456789 42007 D 21-12-2021

I want to calculate start_date and end_date based on READ_DT, for a ID and SERIAL_NUMBER if all READ_DT are available then the output should be as below

ID SERIAL_NUMBER STATUS Start_Date End_Date
123456789 42007 D 15-12-2021 21-12-2021

Scenario 2

ID SERIAL_NUMBER STATUS READ_DT
123456789 42007 D 15-12-2021
123456789 42007 D 16-12-2021
123456789 42007 D 17-12-2021
123456789 42007 D 19-12-2021
123456789 42007 D 20-12-2021
123456789 42007 D 21-12-2021

If there is any gap present in between READ_DT then expected output should be in two transactions as below.

ID SERIAL_NUMBER STATUS Start_Date End_Date
123456789 42007 D 15-12-2021 17-12-2021
123456789 42007 D 19-12-2021 21-12-2021

CodePudding user response:

A little sequential temporal math makes short work of these things.

--===== This will work for either scenario
   WITH cteDTgrp AS
(--==== Subtract an increasing number of days from each date to create the date groups.
 SELECT *
        ,DT_Grp = DATEADD(dd,-ROW_NUMBER() OVER (PARTITION BY ID,SERIAL_NUMBER,STATUS ORDER BY READ_DT),READ_DT)
   FROM dbo.YourTableNameHere
)--==== Then the grouping to get the start and end dates is trivial.
 SELECT ID,SERIAL_NUMBER,STATUS
        ,Start_Date = MIN(READ_DT)
        ,End_Date   = MAX(READ_DT)
   FROM cteDTgrp
  GROUP BY ID,SERIAL_NUMBER,STATUS,DT_Grp --<----This is the key!
  ORDER BY ID,SERIAL_NUMBER,STATUS,Start_Date
;

Note that this will only work if the READ_DT is unique for each group of ID,SERIAL_NUMBER,STATUS.

CodePudding user response:

For scenario 1 you can direct use aggregate min and max functions group by remaining columns.

select ID,SERIAL_NUMBER, STATUS, convert(varchar, min(READ_DT), 105) as Start_Date, convert(varchar, max(READ_DT), 105) as End_Date 
from tb1
group by ID,SERIAL_NUMBER, STATUS

enter image description here

For Scenario 2, I used the LAG function to get the difference of date of the current row compared to the previous row and performed aggregation after that.

This code worked for scenarios 1 & 2 data.

Code:

   drop table if exists #t1 
  --stores diff_days and missing date from sequence
   SELECT READ_DT,
     case when DATEDIFF(day, LAG(READ_DT) OVER (ORDER BY READ_DT), READ_DT ) is NULL then 1 
     else DATEDIFF(day, LAG(READ_DT) OVER (ORDER BY READ_DT), READ_DT )
     end AS diff_day
    ,case when DATEDIFF(day, LAG(READ_DT) OVER (ORDER BY READ_DT), READ_DT ) >1 then DATEADD(day, -1, READ_DT)
     end as diff_read_dt
    into #t1
    from tb2

   --update diff_day column where date greater that missing date to aggregate on the result set
   update #t1
   set diff_day = diff_day 1
   where convert(date,READ_DT) > (select dateadd(day,1,convert(date,diff_read_dt)) from #t1 where diff_read_dt is not null)

  --get the required results using min and max

   select a.ID, a.SERIAL_NUMBER, a.STATUS, convert(varchar, min(a.READ_DT), 105) as Start_Date, convert(varchar, max(a.READ_DT), 105) as End_Date 
   from tb2 a
   inner join #t1 b on convert(date,a.READ_DT) = convert(date,b.READ_DT)
   group by a.ID, a.SERIAL_NUMBER, a.STATUS, b.diff_day

enter image description here

  • Related