I have a dataframe that looks like this:
ENSG | 3dir_S2_S23_L004_R1_001 | 7dir_S2_S25_L004_R1_001 | i3dir_S2_S29_L004_R1_001 | i7dir_S2_S31_L004_R1_001 |
---|---|---|---|---|
ENSG00000000003.15 | 349.0 | 183.0 | 199.0 | 165.0 |
ENSG00000000419.13 | 133.0 | 82.0 | 190.0 | 168.0 |
ENSG00000000457.14 | 62.0 | 56.0 | 95.0 | 111.0 |
ENSG00000000460.17 | 191.0 | 122.0 | 300.0 | 285.0 |
ENSG00000001036.14 | 507.0 | 286.0 | 326.0 | 317.0 |
ENSG00000001084.13 | 205.0 | 192.0 | 310.0 | 320.0 |
ENSG00000001167.14 | 406.0 | 324.0 | 379.0 | 309.0 |
ENSG00000001460.18 | 93.0 | 78.0 | 146.0 | 120.0 |
I'm attempting to perform a calculation on each row of each column, excluding the column ENSG.
Something like this, where I divide each row value by the sum of the entire column:
df = df.transform(lambda x: x / x.sum())
How can I exclude the column ENSG from this calculation? Could I use iloc?
CodePudding user response:
Use set_index
to exclude ENSG
from columns then transform
and reset_index
after:
out = df.set_index('ENSG').transform(lambda x: x / x.sum()).reset_index()
print(out)
# Output:
ENSG 3dir_S2_S23_L004_R1_001 7dir_S2_S25_L004_R1_001 i3dir_S2_S29_L004_R1_001 i7dir_S2_S31_L004_R1_001
0 ENSG00000000003.15 0.179342 0.138322 0.102314 0.091922
1 ENSG00000000419.13 0.068345 0.061980 0.097686 0.093593
2 ENSG00000000457.14 0.031860 0.042328 0.048843 0.061838
3 ENSG00000000460.17 0.098150 0.092215 0.154242 0.158774
4 ENSG00000001036.14 0.260534 0.216175 0.167609 0.176602
5 ENSG00000001084.13 0.105344 0.145125 0.159383 0.178273
6 ENSG00000001167.14 0.208633 0.244898 0.194859 0.172145
7 ENSG00000001460.18 0.047790 0.058957 0.075064 0.066852
CodePudding user response:
Assuming ENSG
is the first column, yes, you can use iloc
:
df.iloc[:, 1:] = df.iloc[:, 1:] / np.sum(df.iloc[:, 1:], axis=0)