I am trying to parse data from a pandas column containing dicts
to a new column. However, I get an value error
when I attempt to do the following.
import pandas as pd
d = pd.DataFrame({
'id': [0, 1, 2],
'str': [{'a':'1'},{'a':'2'},np.nan]
})
d['new_col'] = d.apply(lambda x: d['str'].str['a'] if pd.notnull(x) else x, axis=1)
Traceback:
/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py in __nonzero__(self)
1525 @final
1526 def __nonzero__(self):
-> 1527 raise ValueError(
1528 f"The truth value of a {type(self).__name__} is ambiguous. "
1529 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
CodePudding user response:
d.apply(lambda x: d['str'].str['a'] if pd.notnull(x) else x, axis=1)
What is happening is that you are applying the function to each row of the DataFrame d
, so x
stands for each row (a pd.Series
), not each dictionary (or NaN value) of the column 'str'. Since the truthiness of a Series is ambiguous the error is raised due to the boolean check pd.notnull(x)
.
Try this instead
d['new_col'] = d['str'].apply(lambda x: x['a'] if pd.notnull(x) else x)
Output:
>>> d
id str new_col
0 0 {'a': '1'} 1
1 1 {'a': '2'} 2
2 2 NaN NaN
CodePudding user response:
Don't use apply
that will be slow on large datasets, but rather the str
accessor:
d['new_col'] = d['str'].str['a']
# or
d['new_col'] = d['str'].str.get('a')
Output:
id str new_col
0 0 {'a': '1'} 1
1 1 {'a': '2'} 2
2 2 NaN NaN