Home > Enterprise >  Conditional cumsum and reset to 0
Conditional cumsum and reset to 0

Time:11-21

I read this post but not exactly applicable to my case pandas-resetting cumsum at a specific number

My expected output on column "cumulation":

comment         count   pre_cnt_diff    cumulation
auto 1          10      0               0
auto 2          30      20              20
auto 3          70      40              60
auto 4          120     50              110
auto 5          120     0               110
auto 6          130     10              120
auto 7          150     20              140
manual_input_1  150     0               0
auto 8          200     50              50
auto 9          230     30              80
manual_input_2  230     0               0

My current output on column "cumulation":

comment         count   pre_cnt_diff    cumulation
auto 1          10      0               0
auto 2          30      20              20
auto 3          70      40              60
auto 4          120     50              110
auto 5          120     0               0
auto 6          130     10              10
auto 7          150     20              30
manual_input_1  150     0               0
auto 8          200     50              50
auto 9          230     30              80
manual_input_2  230     0               0

This is my code:

import pandas as pd

d = {'comment': ['auto 1', 'auto 2', 'auto 3', 'auto 4', 'auto 5', 'auto 6', 
'auto 7', 'manual input 1', 'auto 8', 'auto 9', 'manual input 2'],
'count': [10,30,70,120,120,130,150,150,200,230,230]}
df = pd.DataFrame(data=d)

df['pre_cnt_diff'] = df['count'].diff().fillna(0)
g = df.pre_cnt_diff.eq(0).cumsum().fillna(0)
df['cumulation'] = df.groupby(g).pre_cnt_diff.cumsum()

I want the column "cumulation" to reset to 0 when comment has a special pattern of string "manual_input" and continue to cumsum from last 0 value. I think this must to use regex to search my comment column and add if-else statement. Much appreciated if anyone can assist to modify my code.

CodePudding user response:

Create custom groups:

grps = df['comment'].str.contains(r'^manual input').cumsum()

df['cumulation'] = df.groupby(grps)['pre_cont_diff'].cumsum()

Output:

>>> df
           comment  count  pre_cont_diff  cumulation
0           auto 1     10            0.0         0.0
1           auto 2     30           20.0        20.0
2           auto 3     70           40.0        60.0
3           auto 4    120           50.0       110.0
4           auto 5    120            0.0       110.0
5           auto 6    130           10.0       120.0
6           auto 7    150           20.0       140.0
7   manual input 1    150            0.0         0.0
8           auto 8    200           50.0        50.0
9           auto 9    230           30.0        80.0
10  manual input 2    230            0.0         0.0

Details:

>>> pd.concat([df['comment'], grps], axis=1)
           comment  comment
0           auto 1        0
1           auto 2        0
2           auto 3        0
3           auto 4        0
4           auto 5        0
5           auto 6        0
6           auto 7        0
7   manual input 1        1
8           auto 8        1
9           auto 9        1
10  manual input 2        2
  • Related