The following expression yields the boolean value True
using pure Python:
6 > 5 > 4
What is the most pythonic way of chaining comparison operators in a Pandas DataFrame similar to the above?
Desired output here is a new column greater
as per example below:
df = pd.DataFrame([[6,5,4],[1,2,3],index=["a"],columns=["foo","bar","baz"])
Comparing two columns works as expected:
df.loc[df.foo > df.bar, "greater"] = "yes"
foo bar baz greater
a 6 5 4 yes
b 1 2 3 NaN
If we try chaining three cols as per the pure Python example:
df.loc[df.foo > df.bar > df.baz, "greater"] = "yes"
This returns the following error. My understanding is we are trying to compare the bool output from first check with an int in col baz
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I can use the following expression but is there a cleaner way similar to the pure Python example?
df.loc[(df.foo > df.bar) & (df.bar > df.baz), "greater"] = "yes"
CodePudding user response:
No this is not possible, as you have vectors, you need to split the operation in two:
df.loc[(df.foo > df.bar) & (df.bar > df.baz), "greater"] = "yes"
Why?
a > b > c
in pure python is an optimization at the bytecode level, which is not possible with arrays as for some values the shortcut might happen, while for others it won't
The pandas equivalent IMO would be between
:
df.loc[df['bar'].between(df['baz'], df['foo'], inclusive='neither'), 'greater'] = 'yes'
CodePudding user response:
In boolean indexing with loc
not possible, here is trick with DataFrame.query
, limitation is unique index values:
df = pd.DataFrame([[6,5,4],[1,2,3]],columns=["foo","bar","baz"])
df.loc[df.query('foo > bar > baz').index , "greater"] = "yes"
print (df)
foo bar baz greater
0 6 5 4 yes
1 1 2 3 NaN