Home > Software design >  How to get difference between pandas records using dynamic logic and groups?
How to get difference between pandas records using dynamic logic and groups?

Time:10-29

I have a data frame that looks like this:

     GP     LL  A    B
-------------------------
0    A0001  ZZ  100  109
1    A0001  YY  111  113
2    A0001  XX  114  118
3    A0001  WW  119  124
4    A0001  VV  126  138
5    A0001  UU  140  150
6    A0001  TT  152  160
7    A0002  ZZ  101  113
8    A0002  YY  118  121
9    A0002  XX  124  131
...  ...    ... ...  ...

I want to add a new column, "C", which for a given value of "LL" is the difference between "A" in each record and "B" at the given value of "LL", within the same value of "GP".

The values of "LL" don't explicitly repeat for each "GP", but I only expect this to work if the given value of "LL" I am trying to calculate for is present in each "GP".

So the output for the example above would look like this when the given value of "LL" is "XX":

     GP     LL  A    B    C
------------------------------
0    A0001  ZZ  100  109  18
1    A0001  YY  111  113  7
2    A0001  XX  114  118  4
3    A0001  WW  119  124  -1
4    A0001  VV  126  138  -8
5    A0001  UU  140  150  -22
6    A0001  TT  152  160  -34
7    A0002  ZZ  101  113  30
8    A0002  YY  118  121  13
9    A0002  XX  124  131  7
...  ...    ... ...  ...  ...

How can I do this efficiently?

CodePudding user response:

You can mask the column with where, then groupby().transform('first') to propergate the value to the entire data and subtract:

df['C'] = (df['B'].where(df['LL']=='XX')
                  .groupby(df['GP']).transform('first')
                  .sub(df['A'])
          )

Output:

      GP  LL    A    B     C
0  A0001  ZZ  100  109  18.0
1  A0001  YY  111  113   7.0
2  A0001  XX  114  118   4.0
3  A0001  WW  119  124  -1.0
4  A0001  VV  126  138  -8.0
5  A0001  UU  140  150 -22.0
6  A0001  TT  152  160 -34.0
7  A0002  ZZ  101  113  30.0
8  A0002  YY  118  121  13.0
9  A0002  XX  124  131   7.0

CodePudding user response:

Let us try apply with your logic

df['new'] = df.groupby('GP').apply(lambda x : -x['A'] x['B'][x['LL']=='XX'].iloc[0]).values
df
Out[116]: 
      GP  LL    A    B  new
0  A0001  ZZ  100  109   18
1  A0001  YY  111  113    7
2  A0001  XX  114  118    4
3  A0001  WW  119  124   -1
4  A0001  VV  126  138   -8
5  A0001  UU  140  150  -22
6  A0001  TT  152  160  -34
7  A0002  ZZ  101  113   30
8  A0002  YY  118  121   13
9  A0002  XX  124  131    7
  • Related