While working on a topic involving the bitwise AND operator I stumbled over the below occurrence.
Accessing the Series of the Pandas DataFrames and performing the same conditional check, the returned result differs.
- What is happening under the hood in line 95 and 96?
- And why do the outcomes differ for the two dataframes?
In [91]: df = pd.DataFrame({"h": [5300, 5420, 5490], "l": [5150, 5270, 5270]})
In [92]: df
Out[92]:
h l
0 5300 5150
1 5420 5270
2 5490 5270
In [93]: df2 = pd.DataFrame({"h": [5300.1, 5420.1, 5490.1], "l": [5150.1, 5270.1, 5270.1]})
In [94]: df2
Out[94]:
h l
0 5300.1 5150.1
1 5420.1 5270.1
2 5490.1 5270.1
In [95]: df["h"].notna() & df["l"]
Out[95]:
0 False
1 False
2 False
dtype: bool
In [96]: df2["h"].notna() & df2["l"]
Out[96]:
0 True
1 True
2 True
dtype: bool
In [97]:
CodePudding user response:
You've hit some weird implicit casting. I believe what you mean is:
df["h"].notna() & df["l"].notna()
or perhaps
df["h"].notna() & df["l"].astype(bool)
In the original,
df["h"].notna() & df["l"]
you have requested a bitwise operation on two Series, the first of which is dtyped as boolean and the second of which is either integer (in df) or float (in df2).
In the first case, a boolean can be upcast to an int. It appears that what has happened is that the boolean True is upcast to the integer 1 (binary 0000000001), bitwise-anded with the integers 5150, 5270, and 5270, (which gives 0, since all of those are even). E.g. if you set
df.loc[2, 'l'] = 5271
you will see that the final value changes to True.
In the case of df2, a float and a bool cannot be logically anded together. It appears that Pandas here may be implicitly converting the dtype of the float array to bool. numpy itself would not do this:
In [79]: np.float64([.1, .2]) & np.array([True, True])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-79-2c2e50f0bf99> in <module>
----> 1 np.float64([.1, .2]) & np.array([True, True])
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
But pandas seems to allow it:
In [88]: pd.Series([True, True, True]) & pd.Series([0, .1, .2])
Out[88]:
0 False
1 True
2 True
dtype: bool
The same results in numpy can be achieved by using astype bool explicitly:
In [92]: np.array([True, True, True]) & np.float64([0, .1, .2]).astype(bool)
Out[92]: array([False, True, True])
CodePudding user response:
Bitwise comparison of float numbers does not make sense and raises an error at Python level:
>>> float(10) & True
...
TypeError: unsupported operand type(s) for &: 'float' and 'bool'
>>> int(10) & True
0
CodePudding user response:
The Pandas notna
function is only meaningful for floating point columns. There is no NaN
value for integer columns, so it returns False
to remind you of that.