I'm trying to normalize a column of data to 1 based on an internal standard control across several batches of data. However, I'm struggling to do this natively in pandas and not splitting things into multiple chunks with for loops.
import pandas as pd
Test_Data = {"Sample":["Control","Test1","Test2","Test3","Test4","Control","Test1","Test2","Test3","Test4"],
"Batch":["A","A","A","A","A","B","B","B","B","B"],
"Input":[0.1,0.15,0.08,0.11,0.2,0.15,0.1,0.04,0.11,0.2],
"Output":[0.1,0.6,0.08,0.22,0.01,0.08,0.22,0.02,0.13,0.004]}
DB = pd.DataFrame(Test_Data)
DB.loc[:,"Ratio"] = DB["Output"]/DB["Input"]
DB:
Sample Batch Input Output Ratio
0 Control A 0.10 0.100 1.000000
1 Test1 A 0.15 0.600 4.000000
2 Test2 A 0.08 0.080 1.000000
3 Test3 A 0.11 0.220 2.000000
4 Test4 A 0.20 0.010 0.050000
5 Control B 0.15 0.080 0.533333
6 Test1 B 0.10 0.220 2.200000
7 Test2 B 0.04 0.020 0.500000
8 Test3 B 0.11 0.130 1.181818
9 Test4 B 0.20 0.004 0.020000
My desired output would be to normalize each ratio per Batch based on the Control sample, effectively multiplying all the Batch "B" samples by 1.875.
DB:
Sample Batch Input Output Ratio Norm_Ratio
0 Control A 0.10 0.100 1.000000 1.000000
1 Test1 A 0.15 0.600 4.000000 4.000000
2 Test2 A 0.08 0.080 1.000000 1.000000
3 Test3 A 0.11 0.220 2.000000 2.000000
4 Test4 A 0.20 0.010 0.050000 0.050000
5 Control B 0.15 0.080 0.533333 1.000000
6 Test1 B 0.10 0.220 2.200000 4.125000
7 Test2 B 0.04 0.020 0.500000 0.937500
8 Test3 B 0.11 0.130 1.181818 2.215909
9 Test4 B 0.20 0.004 0.020000 0.037500
I can do this by breaking up the dataframe using for loops and manually extracting the "Control" values, but this is slow and messy for large datasets.
CodePudding user response:
Use where
and groupby.transform
:
DB['Norm_Ratio'] = DB['Ratio'].div(
DB['Ratio'].where(DB['Sample'].eq('Control'))
.groupby(DB['Batch']).transform('first')
)
Output:
Sample Batch Input Output Ratio Norm_Ratio
0 Control A 0.10 0.100 1.000000 1.000000
1 Test1 A 0.15 0.600 4.000000 4.000000
2 Test2 A 0.08 0.080 1.000000 1.000000
3 Test3 A 0.11 0.220 2.000000 2.000000
4 Test4 A 0.20 0.010 0.050000 0.050000
5 Control B 0.15 0.080 0.533333 1.000000
6 Test1 B 0.10 0.220 2.200000 4.125000
7 Test2 B 0.04 0.020 0.500000 0.937500
8 Test3 B 0.11 0.130 1.181818 2.215909
9 Test4 B 0.20 0.004 0.020000 0.037500