Here is my dataframe:
import pandas as pd
my_df = pd.DataFrame({'col_1': [1,2,5,6,4,3,7,8,9,11,14]})
my_df['diff'] = my_df['col_1'].diff()
my_df['cond_sum'] = [None, 1,4,5,-2,-3,4,5,6,8,11]
My original column is col_1
. My goal is to obtain column cond_sum
. diff
column is difference between consecutive rows. cond_sum
is cumulative sum of diff
column which resets every time there is change in sign.
I can clearly write a function and do it on row by row basis, but I would like something faster.
CodePudding user response:
Get the sign of your diff, then create a groupby statement and calculate the cumsum
import numpy as np
import pandas as pd
my_df = pd.DataFrame({'col_1': [1,2,5,6,4,3,7,8,9,11,14]})
my_df['diff'] = my_df['col_1'].diff()
sign = np.sign(my_df['diff'])
groupby_ = (sign != sign.shift(1)).cumsum()
my_df['cond_sum'] = my_df.groupby(groupby_)['diff'].cumsum()