Identify the discontinuity and mark it as incremental event in pandas-CodePudding

Given a dataframe, I need to increment the event_id when a discontinuity observed in the column. Here for the given data below, if the difference between the current data and the previous data is >5 then the succeeding column has to be mark with next event_id.

id, data,       event_id,

aa, 2,      1,

aa, 4,      1,

aa, 6,      1,

aa, 12,     2,

aa, 14,     2,

aa, 15,     2,

I tried with below code,

df['pre_data']=df.groupby('id')['data'].shift(1)
df['diff_flag']=np.where((df['data']-df['pre_data'])<5,1,0)

df['event_id']=df['diff_flag'].ne(df.groupby('id')['diff_flag'].shift()).cumsum()

But the code is giving the output of event_id as (1 ,1,1,2,3,3) while the expected output is (1 ,1,1,2,2,2)

CodePudding user response：

if i understand well your problem you need to increase the event_id every time the difference is more than 5. In this case the solution is in your code you just need to change this

df['diff_flag']=np.where((df['data']-df['pre_data'])<5,1,0)

to this:

df['diff_flag']=np.where((df['data']-df['pre_data'])>5,1,0)

and return:

df["diff_flag"].cumsum()

because the numpy where is the oposite of pandas where in numpy we update the value that match the condition