How to calculate an an accumulated value conditionally?-CodePudding

This question is based on this thread.

I have the following dataframe:

diff_hours   stage    sensor
0            0        20
0            0        21
0            0        21
1            0        22
5            0        21
0            0        22
0            1        20
7            1        23
0            1        24
0            3        25
0            3        28
6            0        21
0            0        22

I need to calculated an accumulated value of diff_hours while stage is growing. When stage drops to 0, the accumulated value acc_hours should restart to 0 even though diff_hours might not be equal to 0.

The proposed solution is this one:

blocks = df['stage'].diff().lt(0).cumsum()
df['acc_hours'] = df['diff_hours'].groupby(blocks).cumsum()

Output:

    diff_hours  stage  sensor  acc_hours
0            0      0      20          0
1            0      0      21          0
2            0      0      21          0
3            1      0      22          1
4            5      0      21          6
5            0      0      22          6
6            0      1      20          6
7            7      1      23         13
8            0      1      24         13
9            0      3      25         13
10           0      3      28         13
11           6      0      21          6
12           0      0      22          6

On the line 11 the value of acc_hours is equal to 6. I need it to be restarted to 0, because the stage dropped from 3 back to 0 in row 11.

The expected output:

    diff_hours  stage  sensor  acc_hours
0            0      0      20          0
1            0      0      21          0
2            0      0      21          0
3            1      0      22          1
4            5      0      21          6
5            0      0      22          6
6            0      1      20          6
7            7      1      23         13
8            0      1      24         13
9            0      3      25         13
10           0      3      28         13
11           6      0      21          0
12           0      0      22          0

How can I implement this logic?

CodePudding user response：

The expected output is unclear, what about a simple mask?

Masking only the value during the change:

m = df['stage'].diff().lt(0)
df['acc_hours'] = (df.groupby(m.cumsum())
                     ['diff_hours'].cumsum()
                     .mask(m, 0)
                  )

Output:

    diff_hours  stage  sensor  acc_hours
0            0      0      20          0
1            0      0      21          0
2            0      0      21          0
3            1      0      22          1
4            5      0      21          6
5            0      0      22          6
6            0      1      20          6
7            7      1      23         13
8            0      1      24         13
9            0      3      25         13
10           0      3      28         13
11           6      0      21          0
12           0      0      22          6
13           3      0      22          9
14           0      0      22          9

Or ignoring the value completely bu masking before groupby:

m = df['stage'].diff().lt(0)
df['acc_hours'] = (df['diff_hours'].mask(m, 0)
                     .groupby(m.cumsum())
                     .cumsum()
                  )

Output:

    diff_hours  stage  sensor  acc_hours
0            0      0      20          0
1            0      0      21          0
2            0      0      21          0
3            1      0      22          1
4            5      0      21          6
5            0      0      22          6
6            0      1      20          6
7            7      1      23         13
8            0      1      24         13
9            0      3      25         13
10           0      3      28         13
11           6      0      21          0
12           0      0      22          0
13           3      0      22          3
14           0      0      22          3