I have this code that will subtract, for each person (AAC or AAB), timepoint 1 from time point 2 data.
i.e this is the original data:
pep_seq AAC-T01 AAC-T02 AAB-T01 AAB-T02
0 0 1 2.0 NaN 4.0
1 4 3 2.0 6.0 NaN
2 4 3 NaN 6.0 NaN
3 4 5 2.0 6.0 NaN
This is the code:
import sys
import numpy as np
from sklearn.metrics import auc
import pandas as pd
from numpy import trapz
#read in file
df = pd.DataFrame([[0,1,2,np.nan,4],[4,3,2,6,np.nan],[4,3,np.nan,6,np.nan],[4,5,2,6,np.nan]],columns=['pep_seq','AAC-T01','AAC-T02','AAB-T01','AAB-T02'])
#standardise the data by taking T0 away from each sample
df2 = df.drop(['pep_seq'],axis=1)
df2 = df2.apply(lambda x: x.sub(df2[x.name[:4] "T01"]))
df2.insert(0,'pep_seq',df['pep_seq'])
print(df)
print(df2)
This is the output (i.e. df2)
pep_seq AAC-T01 AAC-T02 AAB-T01 AAB-T02
0 0 0 1.0 NaN NaN
1 4 0 -1.0 0.0 NaN
2 4 0 NaN 0.0 NaN
3 4 0 -3.0 0.0 NaN
...but what I actually wanted was to subtract the T01 columns from all the others EXCEPT for when the T01 value is NaN in which case keep the original value, so the desired output was (see the 4.0 in AAB-T02):
pep_seq AAC-T01 AAC-T02 AAB-T01 AAB-T02
0 0 0 1.0 NaN 4.0
1 4 0 -1.0 0 NaN
2 4 0 NaN 0 NaN
3 4 0 -3.0 0 NaN
Could someone show me where I went wrong? Note that in real life, there are ~100 timepoints per person, not just two.
CodePudding user response:
You can fill the nan to 0 when doing subtraction
df2 = df2.apply(lambda x: x.sub(df2[x.name[:4] "T01"].fillna(0)))
# ^^^^ Changes here
df2.insert(0,'pep_seq',df['pep_seq'])
print(df2)
pep_seq AAC-T01 AAC-T02 AAB-T01 AAB-T02
0 0 0 1.0 NaN 4.0
1 4 0 -1.0 0.0 NaN
2 4 0 NaN 0.0 NaN
3 4 0 -3.0 0.0 NaN
CodePudding user response:
I hope that I understand you correctly but numpy.where()
should do it for you.
Have a look here: condition based substraction