I read this post but not exactly applicable to my case pandas-resetting cumsum at a specific number
My expected output on column "cumulation":
comment count pre_cnt_diff cumulation
auto 1 10 0 0
auto 2 30 20 20
auto 3 70 40 60
auto 4 120 50 110
auto 5 120 0 110
auto 6 130 10 120
auto 7 150 20 140
manual_input_1 150 0 0
auto 8 200 50 50
auto 9 230 30 80
manual_input_2 230 0 0
My current output on column "cumulation":
comment count pre_cnt_diff cumulation
auto 1 10 0 0
auto 2 30 20 20
auto 3 70 40 60
auto 4 120 50 110
auto 5 120 0 0
auto 6 130 10 10
auto 7 150 20 30
manual_input_1 150 0 0
auto 8 200 50 50
auto 9 230 30 80
manual_input_2 230 0 0
This is my code:
import pandas as pd
d = {'comment': ['auto 1', 'auto 2', 'auto 3', 'auto 4', 'auto 5', 'auto 6',
'auto 7', 'manual input 1', 'auto 8', 'auto 9', 'manual input 2'],
'count': [10,30,70,120,120,130,150,150,200,230,230]}
df = pd.DataFrame(data=d)
df['pre_cnt_diff'] = df['count'].diff().fillna(0)
g = df.pre_cnt_diff.eq(0).cumsum().fillna(0)
df['cumulation'] = df.groupby(g).pre_cnt_diff.cumsum()
I want the column "cumulation" to reset to 0 when comment has a special pattern of string "manual_input" and continue to cumsum from last 0 value. I think this must to use regex to search my comment column and add if-else statement. Much appreciated if anyone can assist to modify my code.
CodePudding user response:
Create custom groups:
grps = df['comment'].str.contains(r'^manual input').cumsum()
df['cumulation'] = df.groupby(grps)['pre_cont_diff'].cumsum()
Output:
>>> df
comment count pre_cont_diff cumulation
0 auto 1 10 0.0 0.0
1 auto 2 30 20.0 20.0
2 auto 3 70 40.0 60.0
3 auto 4 120 50.0 110.0
4 auto 5 120 0.0 110.0
5 auto 6 130 10.0 120.0
6 auto 7 150 20.0 140.0
7 manual input 1 150 0.0 0.0
8 auto 8 200 50.0 50.0
9 auto 9 230 30.0 80.0
10 manual input 2 230 0.0 0.0
Details:
>>> pd.concat([df['comment'], grps], axis=1)
comment comment
0 auto 1 0
1 auto 2 0
2 auto 3 0
3 auto 4 0
4 auto 5 0
5 auto 6 0
6 auto 7 0
7 manual input 1 1
8 auto 8 1
9 auto 9 1
10 manual input 2 2