I have seen questions like calculate the difference between rows in DataFrame & i understand Pandas provides df.diff()
API but my question context is slightly different. DataFrame will consist thousands of rows with volume data till that time of the day. Name
column indicates name of instrument. Need to calculate diff between Volume
column only for matching Name
column.
Input DataFrame :
Date Name Volume
1 2011-01-03 A 10
2 2011-01-03 B 20
3 2011-01-03 C 30
4 2011-01-03 A 40
5 2011-01-03 B 30
6 2011-01-03 C 100
7 2011-01-03 A 140
8 2011-01-03 B 50
9 2011-01-03 C 120
Output DataFrame :
Date Name Volume Volume Diff
1 2011-01-03 A 10 10
2 2011-01-03 B 20 20
3 2011-01-03 C 30 30
4 2011-01-03 A 40 30
5 2011-01-03 B 30 10
6 2011-01-03 C 100 70
7 2011-01-03 A 140 100
8 2011-01-03 B 50 20
9 2011-01-03 C 120 20
CodePudding user response:
Question is vague. What if there are multiple dates in a group?
However, with what you have given, group by name, subtract consecutive rows in Name. Some will return nulls, fill those nulls with values from Volume
df['Volume Diff'] = df.groupby('Name')['Volume'].apply(lambda x: x.diff(1)).combine_first(df['Volume'])