Home > database >  How to difference 2 rows in dataframe when condition is met?
How to difference 2 rows in dataframe when condition is met?

Time:04-29

My dataframe:

f1 f2 Condition
2 4 1
1 3 0
4 9 1
9 16 1
16 25 0

I want to get f1-f2, when f1 condition is the first 1 and f2 condition is the last 0. For example, in the first 2 rows, I want to get 2-3=-1. The next example would be 4-25=-21. I cant do a fixed subtraction by just shifting the entire f2 column by 2 because the distance between 1 and 0 can vary as it did in the dataframe.

CodePudding user response:

IIUC, you can make groups when there is a transition from 0 to 1, then take the first f1 and last f2, use that to compute f1 - f2.

With this approach, the first value in a group is necessarily 1 and the last 0, (except eventually for the very first row if 0 and last one if 1, but how should it be handled in this case?)

group = df['Condition'].diff().eq(1).cumsum()
(df
 .groupby(group)
 .agg({'f1': 'first', 'f2': 'last'})
 .eval('f1-f2')
)

variant:

group = df['Condition'].diff().eq(1).cumsum()
(df
 .groupby(group)
 .apply(lambda d: d['f1'].iloc[0]-d['f2'].iloc[-1])
)

output:

Condition
0    -1
1   -21
dtype: int64

CodePudding user response:

You can also mark you groups checking for zero and reverse cumsum:

g = df['Condition'].eq(0)[::-1].cumsum()
df.groupby(g, as_index=False, sort=False).agg({'f1':'first','f2':'last'}).eval('diff = f1 - f2')

Output:

   f1  f2  diff
0   2   3    -1
1   4  25   -21
  • Related