What I'm looking to do is to group by ColA, and compute the ratio between ColB value and the median value in ColB for that group. This ratio will be a new column.
Original DF
ColA ColB
grp1 2
grp1 10
grp1 8
grp1 22
grp2 6
grp2 14
grp2 12
grp2 32
grp3 10
grp3 20
grp3 30
grp3 6
The median for grp1 is 9.0, grp2 is 13.0, grp3 is 15. Thus the new column (ColC) is just the ratio between ColB and the groups median median value. So:
Desired DF
ColA ColB ColC
grp1 2 0.22
grp1 10 1.11
grp1 8 0.89
grp1 22 2.44
grp2 6 0.46
grp2 14 1.07
grp2 12 0.92
grp2 32 2.46
grp3 10 0.66
grp3 20 1.33
grp3 30 2.00
grp3 6 0.40
any suggestions is appreciated!
CodePudding user response:
Try with groupby
:
data["ColC"] = data["ColB"].div(data.groupby("ColA")["ColB"].transform("median"))
>>> data
ColA ColB ColC
0 grp1 2 0.222222
1 grp1 10 1.111111
2 grp1 8 0.888889
3 grp1 22 2.444444
4 grp2 6 0.461538
5 grp2 14 1.076923
6 grp2 12 0.923077
7 grp2 32 2.461538
8 grp3 10 0.666667
9 grp3 20 1.333333
10 grp3 30 2.000000
11 grp3 6 0.400000