I'm trying to create a new column based on a conditional subtraction in python. I want to first group the dataframe by column A and D, then take the row value of C where B equals 2, and subtract that value from all values in column C.
import pandas as pd
data = [
["R", 1, 2, "p"],
["R", 2, 4, "p"],
["R", 3, 6, "p"],
["R", 4, 8, "p"],
["R", 1, 6, "o"],
["R", 2, 3, "o"],
["R", 3, 1, "o"],
["R", 4, 2, "o"],
["S", 0, 5, "n"],
["S", 1, 4, "n"],
["S", 2, 1, "n"],
["S", 3, 3, "n"],
["S", 0, 3, "g"],
["S", 1, 2, "g"],
["S", 2, 9, "g"],
["S", 3, 7, "g"]]
df = pd.DataFrame(data=data, columns=["a", "b", "c", "d"])
df
Out[1]:
a b c d
0 R 1 2 p
1 R 2 4 p
2 R 3 6 p
3 R 4 8 p
4 R 1 6 o
5 R 2 3 o
6 R 3 1 o
7 R 4 2 o
8 S 0 5 n
9 S 1 4 n
10 S 2 1 n
11 S 3 3 n
12 S 0 3 g
13 S 1 2 g
14 S 2 9 g
15 S 3 7 g
Want it to yield the column 'e' of:
Out[2]:
a b c d e
0 R 1 2 p -2
1 R 2 4 p 0
2 R 3 6 p 2
3 R 4 8 p 4
4 R 1 6 o 3
5 R 2 3 o 0
6 R 3 1 o -2
7 R 4 2 o -1
8 S 0 5 n 4
9 S 1 4 n 3
10 S 2 1 n 0
11 S 3 3 n 2
12 S 0 3 g -6
13 S 1 2 g -7
14 S 2 9 g 0
15 S 3 7 g -2
I would appreciate any tips of how to use transform or map functions to solve the problem.
CodePudding user response:
IIUC, you can use a mask before using groupby.transform('first')
:
df['e'] = df['c'] - (df['c'].where(df['b'].eq(2))
.groupby([df['a'], df['d']])
.transform('first')
.convert_dtypes()
)
output:
a b c d e
0 R 1 2 p -2
1 R 2 4 p 0
2 R 3 6 p 2
3 R 4 8 p 4
4 R 1 6 o 3
5 R 2 3 o 0
6 R 3 1 o -2
7 R 4 2 o -1
8 S 0 5 n 4
9 S 1 4 n 3
10 S 2 1 n 0
11 S 3 3 n 2
12 S 0 3 g -6
13 S 1 2 g -7
14 S 2 9 g 0
15 S 3 7 g -2