I have a pandas Dataframe with columns col1
and col2
. I am trying to build col3
as:
df["col3"] = (df["col1"] == 1) | (df["col2"] ==1)
and it works. I tried to rewrite it as:
df["col3"] = any([df[c] == 1 for c in ["col1", "col2"]])
but I get the infamous ValueError: The truth value of a series is ambiguous ...
I even tried to rewrite any( .. )
as pd.Series( .. ).any()
, but it did not work.
How would you do it?
CodePudding user response:
SImpliest is compare all columns filtered in list for boolean DataFrame and add DataFrame.any
:
(df[["col1", "col2"]] == 1).any(axis=1)
Your solution should be changed by np.logical_or.reduce:
np.logical_or.reduce([df[c] == 1 for c in ["col1", "col2"]])
Or a bit overcomplicated:
pd.concat([df[c] == 1 for c in ["col1", "col2"]], axis=1).any(axis=1)
CodePudding user response:
As was already explained in the comments, the any
function implicitly tries (and fails) to convert a series to bool
If you want to have something similar to your second code snippet, you can use numpy's any
function as this supports only a single axis.
import numpy
np.any([df[c] == 1 for c in ["col1", "col2"]], axis=0)
Alternatively, you could also extend your first code snippet to more columns by using reduce
In [6]: import functools
In [7]: functools.reduce(lambda a, b: a | b, [(df[c] == 1) for c in ['col1', 'col2']])