Home > Software design >  Pandas apply/lambda on multiple columns
Pandas apply/lambda on multiple columns

Time:02-24

I have a simple script transforming data in a dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A':[123,None,456],
    'B':[3698,598,None]})

def pad_value(item):
    if item == None or item == np.nan:
        return None
    else:
        return str(item).zfill(7) 
    
df['A'] = df['A'].apply(lambda x:  pad_value(x))
df['B'] = df['B'].apply(lambda x:  pad_value(x))

The above seems to work fine. I have tried rewriting the last two lines to:

cols = ['A', 'B']
df[cols] = df[cols].apply(lambda x:  pad_value(x))

However, this fails and gives a value error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
  1. I am trying to understand why it can't be used in the above way.
  2. My pad_value function seems clunky - I wonder if there is a neater way of achieving the same?

Thanks

CodePudding user response:

First for test missing values or None use isna, for elementwise processing use DataFrame.applymap:

def pad_value(item):
    if pd.isna(item):
        return None
    else:
        return str(item).zfill(7)

cols = ['A', 'B']
df[cols] = df[cols].applymap(pad_value)

With sample data are created floats, here is solution for convert to strings without .0 and NaN and None to Nones, last processing Series.str.zfill (working also with None/NaNs)

df = pd.DataFrame({
    'A':[123,None,456],
    'B':[3698,598,None]})

cols = ['A', 'B']
df[cols] = (df[cols].astype('Int64')
                    .astype(str)
                    .mask(df.isna(), None)
                    .apply(lambda x: x.str.zfill(7))
print (df)
         A        B
0  0000123  0003698
1     None  0000598
2  0000456     None
  • Related