Home > OS >  Bitwise comparison of "slightly" different DataFrames yield conflicting results
Bitwise comparison of "slightly" different DataFrames yield conflicting results

Time:07-19

While working on a topic involving the bitwise AND operator I stumbled over the below occurrence.

Accessing the Series of the Pandas DataFrames and performing the same conditional check, the returned result differs.

  1. What is happening under the hood in line 95 and 96?
  2. And why do the outcomes differ for the two dataframes?
In [91]: df = pd.DataFrame({"h": [5300, 5420, 5490], "l": [5150, 5270, 5270]})

In [92]: df
Out[92]: 
      h     l
0  5300  5150
1  5420  5270
2  5490  5270

In [93]: df2 = pd.DataFrame({"h": [5300.1, 5420.1, 5490.1], "l": [5150.1, 5270.1, 5270.1]})

In [94]: df2
Out[94]: 
        h       l
0  5300.1  5150.1
1  5420.1  5270.1
2  5490.1  5270.1

In [95]: df["h"].notna() & df["l"]
Out[95]: 
0    False
1    False
2    False
dtype: bool

In [96]: df2["h"].notna() & df2["l"]
Out[96]: 
0    True
1    True
2    True
dtype: bool

In [97]: 

CodePudding user response:

You've hit some weird implicit casting. I believe what you mean is:

df["h"].notna() & df["l"].notna()

or perhaps

df["h"].notna() & df["l"].astype(bool)

In the original,

df["h"].notna() & df["l"]

you have requested a bitwise operation on two Series, the first of which is dtyped as boolean and the second of which is either integer (in df) or float (in df2).

In the first case, a boolean can be upcast to an int. It appears that what has happened is that the boolean True is upcast to the integer 1 (binary 0000000001), bitwise-anded with the integers 5150, 5270, and 5270, (which gives 0, since all of those are even). E.g. if you set

df.loc[2, 'l'] = 5271

you will see that the final value changes to True.

In the case of df2, a float and a bool cannot be logically anded together. It appears that Pandas here may be implicitly converting the dtype of the float array to bool. numpy itself would not do this:

In [79]: np.float64([.1, .2]) & np.array([True, True])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-79-2c2e50f0bf99> in <module>
----> 1 np.float64([.1, .2]) & np.array([True, True])

TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

But pandas seems to allow it:

In [88]: pd.Series([True, True, True]) & pd.Series([0, .1, .2])
Out[88]:
0    False
1     True
2     True
dtype: bool

The same results in numpy can be achieved by using astype bool explicitly:

In [92]: np.array([True, True, True]) & np.float64([0, .1, .2]).astype(bool)
Out[92]: array([False,  True,  True])

CodePudding user response:

Bitwise comparison of float numbers does not make sense and raises an error at Python level:

>>> float(10) & True
...
TypeError: unsupported operand type(s) for &: 'float' and 'bool'

>>> int(10) & True
0

CodePudding user response:

The Pandas notna function is only meaningful for floating point columns. There is no NaN value for integer columns, so it returns False to remind you of that.

  • Related