Home > Blockchain >  Pandas assign with apply lambda multiple columns with condition
Pandas assign with apply lambda multiple columns with condition

Time:08-19

I'm looking for the correct way to replacing the label to matched column in my dataframe but I don't get the code working. Is there any solution?

MY DATAFRAME

        labItemsNameRef     label
0       FBS                 decrease
1       FBS                 decrease
2       FBS                 increase
3       HbA1c               decrease
4       Creatinine          changeless
...    ...                  ...
123901  FBS                 decrease
123902  HbA1c               increase
123903  Micro Creatinine    changeless
123904  DTX ก่อนอาหาร       increase
123905  Urine Creatinine    changeless
df = df.assign(
     FBS = lambda df: df.apply(lambda x: x['label'] if x['labItemsNameRef'] == 'FBS'),
     HbA1c = lambda df: df.apply(lambda x: x['label'] if x['labItemsNameRef'] == 'HbA1c'),
     DTX = lambda df: df.apply(lambda x: x['label'] if x['labItemsNameRef'] == 'DTX'),
     BUN = lambda df: df.apply(lambda x: x['label'] if x['labItemsNameRef'] == 'BUN'),
     Creatinine = lambda df: df.apply(lambda x: x['label'] if x['labItemsNameRef'] == 'Creatinine'))

but I got this error

    FBX = lambda df: df.apply(lambda x: x['label'] if x['labItemsNameRef'] == 'FBX'),
                                                                                   ^
SyntaxError: invalid syntax
EXPECTED OUTPUT

       labItemsNameRef  label       FBS      HbA1c    Creatinine BUN DTX
0      FBS              decrease    decrease NaN      NaN        NaN    NaN
1      FBS              decrease    decrease NaN      NaN        NaN    NaN
2      FBS              increase    increase NaN      NaN        NaN    NaN
3      HbA1c            decrease    NaN      decrease NaN        NaN    NaN
4      Creatinine       changeless  NaN      NaN      changeless NaN    NaN
...     ...               ...       ...      ...      ...   ... ...
123901 FBS              decrease    decrease NaN      NaN        NaN    NaN
123902 HbA1c            increase    NaN      increase NaN        NaN    NaN
123903 Micro Creatinine changeless  NaN      NaN      NaN        NaN    NaN
123904 DTX ก่อนอาหาร     increase    NaN      NaN      NaN        NaN    NaN
123905 Urine Creatinine changeless  NaN      NaN      NaN        NaN    NaN

CodePudding user response:

Use get_dummies for indicator column and set values of label in numpy.where:

m = pd.get_dummies(df['labItemsNameRef'], dtype=bool)
df[m.columns] = np.where(m, df[['label']], np.nan)
print (df)

Your solution is slow, because loops in apply, but possible with add else statement and axis=1:

df = df.assign(FBS = lambda df: df.apply(lambda x: x['label'] if x['labItemsNameRef'] == 'FBS' else np.nan, axis=1))
  • Related