Home > other >  Proportional distribution of a given value between two dates in SQL Server
Proportional distribution of a given value between two dates in SQL Server

Time:11-08

There's a table with three columns: start date, end date and task duration in hours. For example, something like that:

Id StartDate EndDate Duration
1 07-11-2022 15-11-2022 40
2 02-09-2022 02-11-2022 122
3 10-10-2022 05-11-2022 52

And I want to get a table like that:

Id Month HoursPerMonth
1 11 40
2 09 56
2 10 62
2 11 4
3 10 42
3 11 10

Briefly, I wanted to know, how many working hours is in each month between start and end dates. Proportionally. How can I achieve that by MS SQL Query? Data is quite big so the query speed is important enough. Thanks in advance!

I've tried DATEDIFF and EOMONTH, but that solution doesn't work with tasks > 2 months. And I'm sure that this solution is bad decision. I hope, that it can be done more elegant way.

CodePudding user response:

Here is an option using an ad-hoc tally/calendar table

Not sure I'm agree with your desired results

Select ID
      ,Month = month(D)
      ,HoursPerMonth = (sum(1.0) / (1 max(datediff(DAY,StartDate,EndDate)))) * max(Duration)
 From YourTable A
 Join (
        Select Top 75000 D=dateadd(day,Row_Number() Over (Order By (Select NULL)),0) 
         From  master..spt_values n1, master..spt_values n2
      ) B on D between StartDate and EndDate
  Group By ID,month(D)
  Order by ID,Month

Results

enter image description here

CodePudding user response:

This answer uses CTE recursion.

This part just sets up a temp table with the OP's example data.

DECLARE @source
  TABLE (
            SOURCE_ID INT
           ,STARTDATE DATE
           ,ENDDATE   DATE
           ,DURATION  INT
        )
;

INSERT
  INTO
        @source
VALUES
        (1, '20221107', '20221115', 40 )
       ,(2, '20220902', '20221102', 122 )
       ,(3, '20221010', '20221105', 52 )
;

This part is the query based on the above data. The recursive CTE breaks the time period into months. The second CTE does the math. The final selection does some more math and presents the results the way you want to seem them.

WITH CTE AS (
    SELECT
            SRC.SOURCE_ID
           ,SRC.STARTDATE
           ,SRC.ENDDATE
           ,SRC.STARTDATE                                  AS 'INTERIM_START_DATE'
           ,CASE WHEN EOMONTH(SRC.STARTDATE) < SRC.ENDDATE
                 THEN EOMONTH(SRC.STARTDATE)
                 ELSE SRC.ENDDATE
            END                                            AS 'INTERIM_END_DATE'
           ,SRC.DURATION
      FROM
            @source  SRC
    UNION ALL
    SELECT
            CTE.SOURCE_ID
           ,CTE.STARTDATE
           ,CTE.ENDDATE
           ,CASE WHEN EOMONTH(CTE.INTERIM_START_DATE) < CTE.ENDDATE
                 THEN DATEADD( DAY, 1, EOMONTH(CTE.INTERIM_START_DATE) )
                 ELSE CTE.STARTDATE
            END
           ,CASE WHEN EOMONTH(CTE.INTERIM_START_DATE, 1) < CTE.ENDDATE
                 THEN EOMONTH(CTE.INTERIM_START_DATE, 1)
                 ELSE CTE.ENDDATE
            END
           ,CTE.DURATION
      FROM
            CTE
     WHERE
            CTE.INTERIM_END_DATE < CTE.ENDDATE
)
, CTE2 AS (
SELECT
        CTE.SOURCE_ID
       ,CTE.STARTDATE
       ,CTE.ENDDATE
       ,CTE.INTERIM_START_DATE
       ,CTE.INTERIM_END_DATE
       ,CAST( DATEDIFF( DAY, CTE.INTERIM_START_DATE, CTE.INTERIM_END_DATE )   1 AS FLOAT ) AS 'MNTH_DAYS'
       ,CAST( DATEDIFF( DAY, CTE.STARTDATE, CTE.ENDDATE )   1 AS FLOAT )                   AS 'TTL_DAYS'
       ,CAST( CTE.DURATION AS FLOAT )                                                      AS 'DURATION'
  FROM
        CTE
)
  SELECT
          CTE2.SOURCE_ID                                            AS 'Id'
         ,MONTH( CTE2.INTERIM_START_DATE )                          AS 'Month'
         ,ROUND( CTE2.MNTH_DAYS/CTE2.TTL_DAYS * CTE2.DURATION, 0 )  AS 'HoursPerMonth'
         
    FROM
          CTE2
ORDER BY
          CTE2.SOURCE_ID
         ,CTE2.INTERIM_END_DATE
;

My results agree with Mr. Cappelletti's, not the OP's. Perhaps some tweaking regarding the definition of a "Day" is needed. I don't know.

If time between start and end date is large (more than 100 months) you may want to specify OPTION (MAXRECURSION 0) at the end.

  • Related