I observed this strange behavior while checking if a number is in a list. If the number is of generic int type, the check failed; but the check went through successfully if the number is of numpy.int64 type. Can anyone explain why? I know I could do better by generating the list lst=df['A'].values.tolist()
to get a list of integers instead of a list of list. But my question is why the numpy.int64 would work below?
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': range(31, 36)})
print(df)
# A
# 0 31
# 1 32
# 2 33
# 3 34
# 4 35
lst=df.values.tolist()
print(lst)
# [[31], [32], [33], [34], [35]]
x=31
print(x) # 31
print(type(x)) # <class 'int'>
if x in lst:
print('Yes')
else:
print('No')
# prints No!
y=df['A'][0]
print(y) # 31
print(type(y)) # <class 'numpy.int64'>
if y in lst:
print('Yes')
else:
print('No')
# prints Yes
CodePudding user response:
Your list doesn't contain 31
. It contains another list that contains 31
, but it doesn't directly contain 31
.
thing in lst
works like this:
for x in lst:
if x is thing or x == thing:
return True
return False
When you check whether a regular int is in your list, x == thing
is always False
, because all elements of your list are more lists, and an int is never equal to a list. However, with a numpy.int64, the comparison broadcasts. When you compare
numpy.int64(31) == [31]
[31]
is converted to a NumPy array, and you get an array of elementwise comparison results, comparing numpy.int64(31)
to every element of the array, resulting in
numpy.array([True])
A one-element NumPy boolean array is treated as its single element in an if
check, so when the list's in
logic compares numpy.int64(31)
against [31]
, it thinks these are equal, and it reports True
as the result of the in
check.