If I have a dataframe with [Group], [DTime] and [Value] columns
For each [Group] I'm trying to find the difference between the first [Value] and every subsequent value from a set [DTime], for this example say it's the start of the df at 2015-01-01.
Ultimately I would like to plot a timeseries of [Difference] with a trace for each [Group] where every trace begins at 0 on the Y-axis.
I've used this below, which does work to give the difference but I'm not sure how to implement pd.groupby here to get it to repeat for each group rather than start at the very beginning.
df['Difference'] = df.loc[1:, 'Value'] - df.at[0, 'Value']
Group Dtime Value
Grp1 2015-01-01 1261.406773
Grp1 2015-01-02 1252.660231
Grp1 2015-01-03 1223.076426
Grp2 2015-01-01 1214.402352
Grp2 2015-01-02 1422.532532
Grp2 2015-01-03 1262.990213
Is this even the best way to go about this?
CodePudding user response:
Try this:
df['Difference'] = (df['Value'] -
df.sort_values('Dtime').groupby('Group')['Value']
.transform('first'))
Output:
Group Dtime Value Difference
0 Grp1 2015-01-01 1261.406773 0.000000
1 Grp1 2015-01-02 1252.660231 -8.746542
2 Grp1 2015-01-03 1223.076426 -38.330347
3 Grp2 2015-01-01 1214.402352 0.000000
4 Grp2 2015-01-02 1422.532532 208.130180
5 Grp2 2015-01-03 1262.990213 48.587861