Home > Software design >  Chaining comparison operators in a Pandas DataFrame
Chaining comparison operators in a Pandas DataFrame

Time:07-07

The following expression yields the boolean value True using pure Python:

6 > 5 > 4

What is the most pythonic way of chaining comparison operators in a Pandas DataFrame similar to the above?

Desired output here is a new column greater as per example below:

df = pd.DataFrame([[6,5,4],[1,2,3],index=["a"],columns=["foo","bar","baz"])

Comparing two columns works as expected:

df.loc[df.foo > df.bar, "greater"] = "yes"


foo bar baz greater
a   6   5   4   yes
b   1   2   3   NaN

If we try chaining three cols as per the pure Python example:

df.loc[df.foo > df.bar > df.baz, "greater"] = "yes"

This returns the following error. My understanding is we are trying to compare the bool output from first check with an int in col baz

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I can use the following expression but is there a cleaner way similar to the pure Python example?

df.loc[(df.foo > df.bar) & (df.bar > df.baz), "greater"] = "yes"

CodePudding user response:

No this is not possible, as you have vectors, you need to split the operation in two:

df.loc[(df.foo > df.bar) & (df.bar > df.baz), "greater"] = "yes"

Why? a > b > c in pure python is an optimization at the bytecode level, which is not possible with arrays as for some values the shortcut might happen, while for others it won't

The pandas equivalent IMO would be between:

df.loc[df['bar'].between(df['baz'], df['foo'], inclusive='neither'), 'greater'] = 'yes'

CodePudding user response:

In boolean indexing with loc not possible, here is trick with DataFrame.query, limitation is unique index values:

df = pd.DataFrame([[6,5,4],[1,2,3]],columns=["foo","bar","baz"])
                   
df.loc[df.query('foo > bar > baz').index , "greater"] = "yes"            
print (df)
   foo  bar  baz greater
0    6    5    4     yes
1    1    2    3     NaN
  • Related