Home > Enterprise >  Subtract one column from another in pandas - with a condition
Subtract one column from another in pandas - with a condition

Time:04-20

I have this code that will subtract, for each person (AAC or AAB), timepoint 1 from time point 2 data.

i.e this is the original data:

    pep_seq  AAC-T01  AAC-T02  AAB-T01  AAB-T02
0        0        1      2.0      NaN      4.0
1        4        3      2.0      6.0      NaN
2        4        3      NaN      6.0      NaN
3        4        5      2.0      6.0      NaN

This is the code:

import sys
import numpy as np
from sklearn.metrics import auc
import pandas as pd
from numpy import trapz

#read in file
df = pd.DataFrame([[0,1,2,np.nan,4],[4,3,2,6,np.nan],[4,3,np.nan,6,np.nan],[4,5,2,6,np.nan]],columns=['pep_seq','AAC-T01','AAC-T02','AAB-T01','AAB-T02'])

#standardise the data by taking T0 away from each sample
df2 = df.drop(['pep_seq'],axis=1)
df2 = df2.apply(lambda x: x.sub(df2[x.name[:4] "T01"]))
df2.insert(0,'pep_seq',df['pep_seq'])

print(df)
print(df2)

This is the output (i.e. df2)

   pep_seq  AAC-T01  AAC-T02  AAB-T01  AAB-T02
0        0        0      1.0      NaN      NaN
1        4        0     -1.0      0.0      NaN
2        4        0      NaN      0.0      NaN
3        4        0     -3.0      0.0      NaN

...but what I actually wanted was to subtract the T01 columns from all the others EXCEPT for when the T01 value is NaN in which case keep the original value, so the desired output was (see the 4.0 in AAB-T02):

   pep_seq  AAC-T01  AAC-T02  AAB-T01  AAB-T02
0        0        0       1.0     NaN    4.0
1        4        0      -1.0     0      NaN
2        4        0      NaN      0      NaN
3        4        0      -3.0     0      NaN

Could someone show me where I went wrong? Note that in real life, there are ~100 timepoints per person, not just two.

CodePudding user response:

You can fill the nan to 0 when doing subtraction

df2 = df2.apply(lambda x: x.sub(df2[x.name[:4] "T01"].fillna(0)))

#                                                     ^^^^ Changes here
df2.insert(0,'pep_seq',df['pep_seq'])
print(df2)

   pep_seq  AAC-T01  AAC-T02  AAB-T01  AAB-T02
0        0        0      1.0      NaN      4.0
1        4        0     -1.0      0.0      NaN
2        4        0      NaN      0.0      NaN
3        4        0     -3.0      0.0      NaN

CodePudding user response:

I hope that I understand you correctly but numpy.where() should do it for you.

Have a look here: condition based substraction

  • Related