I have a simple script transforming data in a dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A':[123,None,456],
'B':[3698,598,None]})
def pad_value(item):
if item == None or item == np.nan:
return None
else:
return str(item).zfill(7)
df['A'] = df['A'].apply(lambda x: pad_value(x))
df['B'] = df['B'].apply(lambda x: pad_value(x))
The above seems to work fine. I have tried rewriting the last two lines to:
cols = ['A', 'B']
df[cols] = df[cols].apply(lambda x: pad_value(x))
However, this fails and gives a value error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
- I am trying to understand why it can't be used in the above way.
- My pad_value function seems clunky - I wonder if there is a neater way of achieving the same?
Thanks
CodePudding user response:
First for test missing values or None
use isna
, for elementwise processing use DataFrame.applymap
:
def pad_value(item):
if pd.isna(item):
return None
else:
return str(item).zfill(7)
cols = ['A', 'B']
df[cols] = df[cols].applymap(pad_value)
With sample data are created floats, here is solution for convert to strings without .0
and NaN
and None
to None
s, last processing Series.str.zfill
(working also with None/NaN
s)
df = pd.DataFrame({
'A':[123,None,456],
'B':[3698,598,None]})
cols = ['A', 'B']
df[cols] = (df[cols].astype('Int64')
.astype(str)
.mask(df.isna(), None)
.apply(lambda x: x.str.zfill(7))
print (df)
A B
0 0000123 0003698
1 None 0000598
2 0000456 None