I have a pandas dataframe looking like.
time value group
0 1 12 1
1 2 14 1
2 3 15 2
3 4 15 1
4 5 18 2
5 6 20 1
6 7 19 2
7 8 24 2
I know want to calculate the spread between group 1 and group 2 for the latest values. I.e. in each row I want to look at the latest value for group 1 and group 2 and calculate value of group 1 - value of group 2.
In the example the output should look like
time value group diff
0 1 12 1 0
1 2 14 1 0
2 3 15 2 -1
3 4 15 1 0
4 5 18 2 -3
5 6 20 1 2
6 7 19 2 1
7 8 24 2 -4
The only function I could find so far was pd.diff() but it doesn't satisfy my needs. So I would really appreciate some help here. Thanks!
CodePudding user response:
You can forward fill values for group
1 and 2 respectively first and then calculate the difference:
df['diff'] = df.value.where(df.group == 1).ffill() - df.value.where(df.group == 2).ffill()
df
time value group diff
0 1 12 1 NaN
1 2 14 1 NaN
2 3 15 2 -1.0
3 4 15 1 0.0
4 5 18 2 -3.0
5 6 20 1 2.0
6 7 19 2 1.0
7 8 24 2 -4.0
Use fillna
-- df['diff'] = df['diff'].fillna(0)
if you need to fill NaN
.