Home > Blockchain >  Same comparison over two DataFrame columns to form a mask
Same comparison over two DataFrame columns to form a mask

Time:11-20

I have a pandas Dataframe with columns col1 and col2. I am trying to build col3 as:

df["col3"] = (df["col1"] == 1) | (df["col2"] ==1)

and it works. I tried to rewrite it as:

df["col3"] = any([df[c] == 1 for c in ["col1", "col2"]])

but I get the infamous ValueError: The truth value of a series is ambiguous ...

I even tried to rewrite any( .. ) as pd.Series( .. ).any(), but it did not work.

How would you do it?

CodePudding user response:

SImpliest is compare all columns filtered in list for boolean DataFrame and add DataFrame.any:

(df[["col1", "col2"]] == 1).any(axis=1)

Your solution should be changed by np.logical_or.reduce:

np.logical_or.reduce([df[c] == 1 for c in ["col1", "col2"]])

Or a bit overcomplicated:

pd.concat([df[c] == 1 for c in ["col1", "col2"]], axis=1).any(axis=1)

CodePudding user response:

As was already explained in the comments, the any function implicitly tries (and fails) to convert a series to bool

If you want to have something similar to your second code snippet, you can use numpy's any function as this supports only a single axis.

import numpy
np.any([df[c] == 1 for c in ["col1", "col2"]], axis=0)

Alternatively, you could also extend your first code snippet to more columns by using reduce

In [6]: import functools
In [7]: functools.reduce(lambda a, b: a | b, [(df[c] == 1) for c in ['col1', 'col2']])
  • Related