I'm trying to create a new column based on a conditional subtraction in python. I want to first group the dataframe by column A, then take the row value of C where B equals 2, and subtract that value from all values in column C.
import pandas as pd
data = [
["R", 1, 2],
["R", 2, 4],
["R", 3, 6],
["R", 4, 8],
["S", 0, 5],
["S", 1, 4],
["S", 2, 1],
["S", 3, 3]]
df = pd.DataFrame(data=data, columns=["a", "b", "c"])
df
Out[1]:
a b c
0 R 1 2
1 R 2 4
2 R 3 6
3 R 4 8
4 S 0 5
5 S 1 4
6 S 2 1
7 S 3 3
Want it to yield the column 'd' of:
Out[2]:
a b c d
0 R 1 2 -2
1 R 2 4 0
2 R 3 6 2
3 R 4 8 4
4 S 0 5 4
5 S 1 4 3
6 S 2 1 0
7 S 3 3 2
I would appreciate any tips of how to use transform or map functions to solve the problem.
CodePudding user response:
It's easiest to temporarily set the index to a
. Then you can do the subtraction as usual while pandas will automatically align the index. Finally reset the index.
df1 = df.set_index('a')
df1['d'] = df1.c - df1.loc[df1.b.eq(2), 'c']
df1.reset_index()
Result:
a b c d
0 R 1 2 -2
1 R 2 4 0
2 R 3 6 2
3 R 4 8 4
4 S 0 5 4
5 S 1 4 3
6 S 2 1 0
7 S 3 3 2
(this of course only works if there is no more than one row for each value in a
where b
equals 2
)