Home > Software design >  Calculating difference between two rows, based on another column match condition, in Python / Pandas
Calculating difference between two rows, based on another column match condition, in Python / Pandas

Time:12-14

I have seen questions like calculate the difference between rows in DataFrame & i understand Pandas provides df.diff() API but my question context is slightly different. DataFrame will consist thousands of rows with volume data till that time of the day. Name column indicates name of instrument. Need to calculate diff between Volume column only for matching Name column.

Input DataFrame :

         Date    Name   Volume
1  2011-01-03     A       10
2  2011-01-03     B       20
3  2011-01-03     C       30
4  2011-01-03     A       40
5  2011-01-03     B       30
6  2011-01-03     C       100
7  2011-01-03     A       140
8  2011-01-03     B       50
9  2011-01-03     C       120

Output DataFrame :

         Date    Name   Volume  Volume Diff
1  2011-01-03     A       10        10
2  2011-01-03     B       20        20
3  2011-01-03     C       30        30
4  2011-01-03     A       40        30
5  2011-01-03     B       30        10
6  2011-01-03     C       100       70
7  2011-01-03     A       140      100
8  2011-01-03     B       50        20
9  2011-01-03     C       120       20

CodePudding user response:

Question is vague. What if there are multiple dates in a group?

However, with what you have given, group by name, subtract consecutive rows in Name. Some will return nulls, fill those nulls with values from Volume

   df['Volume Diff'] = df.groupby('Name')['Volume'].apply(lambda x: x.diff(1)).combine_first(df['Volume'])
  • Related