I have a data frame that looks like this:
GP LL A B
-------------------------
0 A0001 ZZ 100 109
1 A0001 YY 111 113
2 A0001 XX 114 118
3 A0001 WW 119 124
4 A0001 VV 126 138
5 A0001 UU 140 150
6 A0001 TT 152 160
7 A0002 ZZ 101 113
8 A0002 YY 118 121
9 A0002 XX 124 131
... ... ... ... ...
I want to add a new column, "C", which for a given value of "LL" is the difference between "A" in each record and "B" at the given value of "LL", within the same value of "GP".
The values of "LL" don't explicitly repeat for each "GP", but I only expect this to work if the given value of "LL" I am trying to calculate for is present in each "GP".
So the output for the example above would look like this when the given value of "LL" is "XX":
GP LL A B C
------------------------------
0 A0001 ZZ 100 109 18
1 A0001 YY 111 113 7
2 A0001 XX 114 118 4
3 A0001 WW 119 124 -1
4 A0001 VV 126 138 -8
5 A0001 UU 140 150 -22
6 A0001 TT 152 160 -34
7 A0002 ZZ 101 113 30
8 A0002 YY 118 121 13
9 A0002 XX 124 131 7
... ... ... ... ... ...
How can I do this efficiently?
CodePudding user response:
You can mask the column with where
, then groupby().transform('first')
to propergate the value to the entire data and subtract:
df['C'] = (df['B'].where(df['LL']=='XX')
.groupby(df['GP']).transform('first')
.sub(df['A'])
)
Output:
GP LL A B C
0 A0001 ZZ 100 109 18.0
1 A0001 YY 111 113 7.0
2 A0001 XX 114 118 4.0
3 A0001 WW 119 124 -1.0
4 A0001 VV 126 138 -8.0
5 A0001 UU 140 150 -22.0
6 A0001 TT 152 160 -34.0
7 A0002 ZZ 101 113 30.0
8 A0002 YY 118 121 13.0
9 A0002 XX 124 131 7.0
CodePudding user response:
Let us try apply
with your logic
df['new'] = df.groupby('GP').apply(lambda x : -x['A'] x['B'][x['LL']=='XX'].iloc[0]).values
df
Out[116]:
GP LL A B new
0 A0001 ZZ 100 109 18
1 A0001 YY 111 113 7
2 A0001 XX 114 118 4
3 A0001 WW 119 124 -1
4 A0001 VV 126 138 -8
5 A0001 UU 140 150 -22
6 A0001 TT 152 160 -34
7 A0002 ZZ 101 113 30
8 A0002 YY 118 121 13
9 A0002 XX 124 131 7