I have a table like this.
https://i.stack.imgur.com/XpPZi.png
I need to add up the phase duration for each ID and each phase from month to month. If a phase does not occur in a month, then in this month I need the sum from this phase in the previous month, so that in every month is every phase from the past ( value from current month if this phase exist in the month).
Expected result
https://i.stack.imgur.com/ukX0I.png
My issue is, that I don't get the sum for each phase in the month an don't get the phase in the month if this phase is not occur in the month. Can someone please help me?
CREATE TABLE [dbo].[Tab_Status_Test](
[ID] [int] NULL,
[Phase] [nvarchar](50) NULL,
[Phase_duration] [int] NULL,
[EOM_Date] [date] NULL
) ON [PRIMARY]
insert into Tab_status_test
(ID ,Phase,Phase_duration, EOM_Date)
values
('1' ,'C' , '22','2021/02/28')
,('1' ,'A' , '13','2021/03/31')
,('1' ,'A' , '5','2021/03/31')
,('1' ,'B' , '2','2021/03/31')
,('1' ,'B' , '19','2021/04/30')
,('1' ,'A' , '3','2021/04/30')
,('1' ,'B' , '1','2021/04/30')
,('1' ,'A' , '3','2021/04/30')
,('1' ,'B' , '22','2021/05/31')
,('1' ,'C' , '22','2021/06/30')
,('1' ,'D' , '20','2021/07/31')
,('1' ,'A' , '2','2021/07/31')
,('2' ,'C' , '22','2021/02/28')
,('2' ,'A' , '13','2021/03/31')
,('2' ,'A' , '5','2021/03/31')
,('3' ,'B' , '2','2021/03/31')
,('3' ,'B' , '19','2021/04/30')
,('2' ,'A' , '3','2021/04/30')
,('3' ,'B' , '1','2021/04/30')
,('2' ,'A' , '3','2021/04/30')
,('2' ,'B' , '22','2021/05/31')
,('3' ,'C' , '22','2021/06/30')
,('3' ,'D' , '20','2021/07/31')
,('3' ,'A' , '2','2021/07/31')
This is my code
WITH Sum_Dur
AS
(
SELECT ID
,EOM_Date
,phase
,Phase_duration
,LAG(Phase_duration) OVER (Partition BY phase, eom_date ORDER BY phase,eom_date) as PrevEvent
FROM [CM_PT].[dbo].Tab_Status_Test
)
SELECT *,
SUM(PrevEvent Phase_duration) AS SummedCount
FROM Sum_Dur
GROUP BY ID
,EOM_Date
,phase
,Phase_duration
, PrevEvent
CodePudding user response:
The main thing that you want is a running sum, which you can get using SUM() OVER(PARTITION BY ... ORDER BY ...)
.
Since you want to include gaps, you will need to generate a complete set of IDs, Dates and Phases, which you can do with several SELECT DISTINCT ...
subqueries CROSS JOIN
ed together.
Because you data contains multiple entries with activity for the same ID, date, and phase, that data needs to be grouped to avoid duplicate rows in the results.
The final piece is eliminating early results before any activity has occurred. That can be done by wrapping everything else up as another subselect to apply a WHERE Phase_duration > 0
condition.
The result is something like:
SELECT *
FROM (
-- Running totals
SELECT I.ID, D.EOM_Date, P.Phase,
SUM(T.Phase_duration) OVER(Partition by I.ID, P.Phase ORDER BY D.EOM_DATE) AS Phase_duration
FROM (SELECT DISTINCT ID FROM Tab_status_test) I
CROSS JOIN (SELECT DISTINCT EOM_Date FROM Tab_status_test) D
CROSS JOIN (SELECT DISTINCT Phase FROM Tab_status_test) P
LEFT JOIN (
-- Monthly totals
SELECT ID, EOM_Date, Phase, SUM(Phase_duration) AS Phase_duration
FROM Tab_status_test T
GROUP BY ID, EOM_Date, Phase
) T ON T.ID = I.ID AND T.EOM_Date = D.EOM_Date AND T.Phase = P.Phase
) A
WHERE A.Phase_duration > 0
ORDER BY A.ID, A.EOM_Date, A.Phase
Partial results:
ID | EOM_Date | Phase | Phase_duration |
---|---|---|---|
1 | 2021-02-28 | C | 22 |
1 | 2021-03-31 | A | 18 |
1 | 2021-03-31 | B | 2 |
1 | 2021-03-31 | C | 22 |
1 | 2021-04-30 | A | 24 |
1 | 2021-04-30 | B | 22 |
1 | 2021-04-30 | C | 22 |
See this db<>fiddle.
The above assumes that there is at least some activity in every month. If you could potentially have a gap where there is no activity for an entire month, you will need to replace the distinct-date subselect with a calendar gererator.
Your "expected results" were slightly different from the results I got. In particular, the total phase duration for {ID = 1, EOM_Date = 2022-04-30, Phase = B} should be 22 instead of 21 based on your supplied data.
If you want to order your data as C/A/B, you can replace the Phase term in the ORDER BY with a case statement that maps teh values to an alternate sort order. Something like CASE Phase WHEN 'C' THEN 1 WHEN 'A' THEN 2 WHEN 'B' THEN 3 END
.