This question is based on this thread.
I have the following dataframe:
diff_hours stage sensor
0 0 20
0 0 21
0 0 21
1 0 22
5 0 21
0 0 22
0 1 20
7 1 23
0 1 24
0 3 25
0 3 28
6 0 21
0 0 22
I need to calculated an accumulated value of diff_hours
while stage
is growing. When stage drops to 0, the accumulated value acc_hours
should restart to 0 even though diff_hours
might not be equal to 0.
The proposed solution is this one:
blocks = df['stage'].diff().lt(0).cumsum()
df['acc_hours'] = df['diff_hours'].groupby(blocks).cumsum()
Output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 6
12 0 0 22 6
On the line 11 the value of acc_hours
is equal to 6. I need it to be restarted to 0, because the stage
dropped from 3
back to 0
in row 11.
The expected output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 0
12 0 0 22 0
How can I implement this logic?
CodePudding user response:
The expected output is unclear, what about a simple mask
?
Masking only the value during the change:
m = df['stage'].diff().lt(0)
df['acc_hours'] = (df.groupby(m.cumsum())
['diff_hours'].cumsum()
.mask(m, 0)
)
Output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 0
12 0 0 22 6
13 3 0 22 9
14 0 0 22 9
Or ignoring the value completely bu masking before groupby:
m = df['stage'].diff().lt(0)
df['acc_hours'] = (df['diff_hours'].mask(m, 0)
.groupby(m.cumsum())
.cumsum()
)
Output:
diff_hours stage sensor acc_hours
0 0 0 20 0
1 0 0 21 0
2 0 0 21 0
3 1 0 22 1
4 5 0 21 6
5 0 0 22 6
6 0 1 20 6
7 7 1 23 13
8 0 1 24 13
9 0 3 25 13
10 0 3 28 13
11 6 0 21 0
12 0 0 22 0
13 3 0 22 3
14 0 0 22 3