Home > Mobile >  Exclude column in pandas
Exclude column in pandas

Time:10-06

I have a dataframe that looks like this:

ENSG 3dir_S2_S23_L004_R1_001 7dir_S2_S25_L004_R1_001 i3dir_S2_S29_L004_R1_001 i7dir_S2_S31_L004_R1_001
ENSG00000000003.15 349.0 183.0 199.0 165.0
ENSG00000000419.13 133.0 82.0 190.0 168.0
ENSG00000000457.14 62.0 56.0 95.0 111.0
ENSG00000000460.17 191.0 122.0 300.0 285.0
ENSG00000001036.14 507.0 286.0 326.0 317.0
ENSG00000001084.13 205.0 192.0 310.0 320.0
ENSG00000001167.14 406.0 324.0 379.0 309.0
ENSG00000001460.18 93.0 78.0 146.0 120.0

I'm attempting to perform a calculation on each row of each column, excluding the column ENSG.

Something like this, where I divide each row value by the sum of the entire column:

df = df.transform(lambda x: x / x.sum())

How can I exclude the column ENSG from this calculation? Could I use iloc?

CodePudding user response:

Use set_index to exclude ENSG from columns then transform and reset_index after:

out = df.set_index('ENSG').transform(lambda x: x / x.sum()).reset_index()
print(out)

# Output:
                 ENSG  3dir_S2_S23_L004_R1_001  7dir_S2_S25_L004_R1_001  i3dir_S2_S29_L004_R1_001  i7dir_S2_S31_L004_R1_001
0  ENSG00000000003.15                 0.179342                 0.138322                  0.102314                  0.091922
1  ENSG00000000419.13                 0.068345                 0.061980                  0.097686                  0.093593
2  ENSG00000000457.14                 0.031860                 0.042328                  0.048843                  0.061838
3  ENSG00000000460.17                 0.098150                 0.092215                  0.154242                  0.158774
4  ENSG00000001036.14                 0.260534                 0.216175                  0.167609                  0.176602
5  ENSG00000001084.13                 0.105344                 0.145125                  0.159383                  0.178273
6  ENSG00000001167.14                 0.208633                 0.244898                  0.194859                  0.172145
7  ENSG00000001460.18                 0.047790                 0.058957                  0.075064                  0.066852

CodePudding user response:

Assuming ENSG is the first column, yes, you can use iloc:

df.iloc[:, 1:] = df.iloc[:, 1:] / np.sum(df.iloc[:, 1:], axis=0)
  • Related