Home > Software engineering >  why does ~True not work in pandas dataframe conditional
why does ~True not work in pandas dataframe conditional

Time:09-23

I am trying to use switches to turn on and off conditionals in a pandas dataframe. The switches are just boolean variables that will be True or False. The problem is that ~True does not evaluate the same as False as I expected it to. Why does this not work?

>>> dataframe = pd.DataFrame({'col1': [3, 4, 5, 6], 'col2': [6, 5, 4, 3]})
>>> dataframe
   col1  col2
0     3     6
1     4     5
2     5     4
3     6     3
>>> dataframe.loc[dataframe.col1 <= dataframe.col2]
   col1  col2
0     3     6
1     4     5
>>> dataframe.loc[(True) | (dataframe.col1 <= dataframe.col2)]
   col1  col2
0     3     6
1     4     5
2     5     4
3     6     3
>>> dataframe.loc[(False) | (dataframe.col1 <= dataframe.col2)]
   col1  col2
0     3     6
1     4     5
>>> dataframe.loc[(~True) | (dataframe.col1 <= dataframe.col2)]
   col1  col2
0     3     6
1     4     5
2     5     4
3     6     3
>>> dataframe.loc[(~(True)) | (dataframe.col1 <= dataframe.col2)]
   col1  col2
0     3     6
1     4     5
2     5     4
3     6     3
>>>

CodePudding user response:

This is a pandas operator behavior (implemented from Numpy).

True is not a pandas object. Instead it's a boolean. So obviously, the ~ operator isn't meant to reverse booleans, only in Pandas.

As you can see:

>>> ~True
-2
>>> 

It gives -2, which is the regular __invert__ magic method behavior.

Therefore:

>>> bool(-2)
True
>>> 

Gives True.

Don't mix up Pandas and Python behavior, Pandas implements it's on __invert__ usage, example:

>>> ~pd.Series([True])
0    False
dtype: bool
>>> 

As you can see, in pandas (also Numpy), it inverts the booleans. Therefor if you write:

>>> dataframe.loc[~pd.Series([True]).any() | (dataframe.col1 <= dataframe.col2)]
   col1  col2
0     3     6
1     4     5
>>> 

You can clearly see that it behaves equivalently as False.

The best way here is with not:

>>> dataframe.loc[(not True) | (dataframe.col1 <= dataframe.col2)]
   col1  col2
0     3     6
1     4     5
>>> 

CodePudding user response:

I think '~' is not what you want, maybe you want to use 'not':

>>> dataframe.loc[(not True) | (dataframe.col1 <= dataframe.col2)]
   col1  col2
0     3     6
1     4     5
  • Related