Home > front end >  Summarize values from month to month
Summarize values from month to month

Time:01-10

I have a table like this.

https://i.stack.imgur.com/XpPZi.png

I need to add up the phase duration for each ID and each phase from month to month. If a phase does not occur in a month, then in this month I need the sum from this phase in the previous month, so that in every month is every phase from the past ( value from current month if this phase exist in the month).

Expected result

https://i.stack.imgur.com/ukX0I.png

My issue is, that I don't get the sum for each phase in the month an don't get the phase in the month if this phase is not occur in the month. Can someone please help me?

CREATE TABLE [dbo].[Tab_Status_Test](
    [ID] [int] NULL,
    [Phase] [nvarchar](50) NULL,
    [Phase_duration] [int] NULL,
    [EOM_Date] [date] NULL
) ON [PRIMARY]
insert into Tab_status_test
(ID ,Phase,Phase_duration, EOM_Date)
  values
     ('1' ,'C' , '22','2021/02/28')
     ,('1' ,'A' , '13','2021/03/31')
    ,('1' ,'A' , '5','2021/03/31')
    ,('1' ,'B' , '2','2021/03/31')
    ,('1' ,'B' , '19','2021/04/30')
    ,('1' ,'A' , '3','2021/04/30')
    ,('1' ,'B' , '1','2021/04/30')
    ,('1' ,'A' , '3','2021/04/30')
    ,('1' ,'B' , '22','2021/05/31')
    ,('1' ,'C' , '22','2021/06/30')
    ,('1' ,'D' , '20','2021/07/31')
    ,('1' ,'A' , '2','2021/07/31')
    ,('2' ,'C' , '22','2021/02/28')
     ,('2' ,'A' , '13','2021/03/31')
    ,('2' ,'A' , '5','2021/03/31')
    ,('3' ,'B' , '2','2021/03/31')
    ,('3' ,'B' , '19','2021/04/30')
    ,('2' ,'A' , '3','2021/04/30')
    ,('3' ,'B' , '1','2021/04/30')
    ,('2' ,'A' , '3','2021/04/30')
    ,('2' ,'B' , '22','2021/05/31')
    ,('3' ,'C' , '22','2021/06/30')
    ,('3' ,'D' , '20','2021/07/31')
    ,('3' ,'A' , '2','2021/07/31')

This is my code

WITH Sum_Dur
AS
(
SELECT  ID
          ,EOM_Date
          ,phase
          ,Phase_duration

,LAG(Phase_duration) OVER (Partition BY phase, eom_date ORDER BY phase,eom_date) as PrevEvent
FROM [CM_PT].[dbo].Tab_Status_Test
)

    SELECT *,
    SUM(PrevEvent Phase_duration) AS SummedCount

     FROM Sum_Dur
     GROUP BY   ID
          ,EOM_Date
          ,phase
          ,Phase_duration
          , PrevEvent

CodePudding user response:

The main thing that you want is a running sum, which you can get using SUM() OVER(PARTITION BY ... ORDER BY ...).

Since you want to include gaps, you will need to generate a complete set of IDs, Dates and Phases, which you can do with several SELECT DISTINCT ... subqueries CROSS JOINed together.

Because you data contains multiple entries with activity for the same ID, date, and phase, that data needs to be grouped to avoid duplicate rows in the results.

The final piece is eliminating early results before any activity has occurred. That can be done by wrapping everything else up as another subselect to apply a WHERE Phase_duration > 0 condition.

The result is something like:

SELECT *
FROM (
    -- Running totals
    SELECT I.ID, D.EOM_Date, P.Phase,
        SUM(T.Phase_duration) OVER(Partition by I.ID, P.Phase ORDER BY D.EOM_DATE) AS Phase_duration
    FROM (SELECT DISTINCT ID FROM Tab_status_test) I
    CROSS JOIN (SELECT DISTINCT EOM_Date FROM Tab_status_test) D
    CROSS JOIN (SELECT DISTINCT Phase FROM Tab_status_test) P
    LEFT JOIN (
        -- Monthly totals
        SELECT ID, EOM_Date, Phase, SUM(Phase_duration) AS Phase_duration
        FROM Tab_status_test T
        GROUP BY ID, EOM_Date, Phase
    ) T ON T.ID = I.ID AND T.EOM_Date = D.EOM_Date AND T.Phase = P.Phase
) A
WHERE A.Phase_duration > 0
ORDER BY A.ID, A.EOM_Date, A.Phase

Partial results:

ID EOM_Date Phase Phase_duration
1 2021-02-28 C 22
1 2021-03-31 A 18
1 2021-03-31 B 2
1 2021-03-31 C 22
1 2021-04-30 A 24
1 2021-04-30 B 22
1 2021-04-30 C 22

See this db<>fiddle.

The above assumes that there is at least some activity in every month. If you could potentially have a gap where there is no activity for an entire month, you will need to replace the distinct-date subselect with a calendar gererator.

Your "expected results" were slightly different from the results I got. In particular, the total phase duration for {ID = 1, EOM_Date = 2022-04-30, Phase = B} should be 22 instead of 21 based on your supplied data.

If you want to order your data as C/A/B, you can replace the Phase term in the ORDER BY with a case statement that maps teh values to an alternate sort order. Something like CASE Phase WHEN 'C' THEN 1 WHEN 'A' THEN 2 WHEN 'B' THEN 3 END.

  • Related