Home > Blockchain >  Calculate difference between current row and latest row satisfying a condition
Calculate difference between current row and latest row satisfying a condition

Time:12-13

I have a pandas dataframe looking like.

    time value group
0   1   12  1
1   2   14  1
2   3   15  2
3   4   15  1
4   5   18  2
5   6   20  1
6   7   19  2
7   8   24  2

I know want to calculate the spread between group 1 and group 2 for the latest values. I.e. in each row I want to look at the latest value for group 1 and group 2 and calculate value of group 1 - value of group 2.

In the example the output should look like

    time value group diff
0   1   12  1 0
1   2   14  1 0
2   3   15  2 -1
3   4   15  1 0
4   5   18  2 -3
5   6   20  1 2
6   7   19  2 1
7   8   24  2 -4

The only function I could find so far was pd.diff() but it doesn't satisfy my needs. So I would really appreciate some help here. Thanks!

CodePudding user response:

You can forward fill values for group 1 and 2 respectively first and then calculate the difference:

df['diff'] = df.value.where(df.group == 1).ffill() - df.value.where(df.group == 2).ffill()
df
   time  value  group  diff
0     1     12      1   NaN
1     2     14      1   NaN
2     3     15      2  -1.0
3     4     15      1   0.0
4     5     18      2  -3.0
5     6     20      1   2.0
6     7     19      2   1.0
7     8     24      2  -4.0

Use fillna -- df['diff'] = df['diff'].fillna(0) if you need to fill NaN.

  • Related