For an all-nan list a = [np.nan, np.nan]
, a.index(np.nan)
returns 0
, while for the np.nan
returned by b = np.nanmax(a)
, a.index(b)
gives a ValueError
. The object ids of np.nan
and b
are different. However, if a
were [2,3.1]
and c = np.array(a).tolist()
, then id(a[1])
and id(c[1])
would be different as well, nonetheless there is no ValueError
for a.index(c[1])
?
How does list.index()
work under the hood? Does it compare for value equality (I guess not, otherwise a.index(np.nan)
should return an error because np.nan != np.nan
)? For object id (again I guess not, otherwise a.index(c[1])
should return an error)? Why does the example with a.index(np.nanmax(a))
not work if a = [np.nan,np.nan]
, while a.index(np.nan)
does?
import numpy as np
a = [np.nan, np.nan]
b = np.nanmax(a)
print(id(np.nan), id(a[0]), id(a[1]), id(b))
a.index(np.nan)
a.index(b)
# Output:
# 47021195940144 47021195940144 47021195940144 47021566155984
# ...
# File "<ipython-input-2-fb7cc8fa88c0>", line 9, in <module>
# a.index(b)
# ValueError: nan is not in list
CodePudding user response:
Implementation of list.index
If you wanna see how index
is implemented (in C) you can look here
To make it easier to understand I rewrote that in python:
import sys
def index(self, value, start=0, stop=sys.maxsize, /):
# make sure that start and end are in boundaries
if start < 0:
start = len(self)
if start < 0:
start = 0
if stop < 0:
stop = len(self)
if stop < 0:
stop = 0
# iterate throughout list and try to find the value
for i, obj in enumerate(self[start:stop]):
if obj is value or obj == value:
return i
raise ValueError("%r is not in list" % value)
Details of why the implementation is like that
To understand this part I would suggest to you to read the implementation which I referenced earlier
All magic is happening in the PyObject_RichCompareBool
:
if it's called like in the index
then it behaves like x is y or x == y
This fact also stated in the docs (index
uses Py_EQ
)
int PyObject_RichCompareBool(PyObject *o1, PyObject *o2, int opid)
Compare the values of o1 and o2 using the operation specified by opid, which must be one of Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT, or Py_GE, corresponding to <, <=, ==, !=, >, or >= respectively. Returns -1 on error, 0 if the result is false, 1 otherwise. This is the equivalent of the Python expression o1 op o2, where op is the operator corresponding to opid.
Note If o1 and o2 are the same object, PyObject_RichCompareBool() will always return 1 for Py_EQ and 0 for Py_NE.
Case with -1
is handled by python, we don't need to worry about it. (python raises exception and automatically stops running our code)
So how does it work?
In the end if we apply our knoledge then we can see the reason why the behaviour is like that:
import numpy as np
instance1 = np.nan
l = [instance1]
instance2 = np.nanmax(l) # RuntimeWarning: All-NaN axis encountered
print(instance1 is instance2 or instance1 == instance2)
# False therefore ValueError
import numpy as np
instance1 = 3.1
l = [instance1]
instance2 = np.array(l).tolist()[0]
print(instance1 is instance2 or instance1 == instance2)
# True (instance1 == instance2) therefore no ValueError
Additionally
Also here are your generalized examples:
import numpy as np
instance1 = np.nan
l = [instance1]
instance2 = np.nanmax(l) # RuntimeWarning: All-NaN axis encountered
assert instance1 is l[0]
assert instance1 is not instance2
assert not l.index(instance1)
assert not l.index(instance2) # ValueError: nan is not in list
and
import numpy as np
instance1 = 3.1
l = [instance1]
instance2 = np.array(l).tolist()[0]
assert instance1 is l[0]
assert instance1 is not instance2
assert not l.index(instance1)
assert not l.index(instance2) # no ValueError
CodePudding user response:
In python you can make a nan
valued object with:
In [80]: mynan=float('nan')
In [81]: id(mynan)
Out[81]: 139640449759024
Make another and get a different id:
In [82]: mynan=float('nan')
In [83]: id(mynan)
Out[83]: 139640449757264
numpy
has its own version:
In [84]: id(np.nan)
Out[84]: 139640952170000
I think that always gives the same id (in a particular session)
Make a list:
In [85]: a = [.1, np.nan, .3, mynan]
np.isnan
can test for nan
values even where id
and value don't work:
In [86]: np.isnan(a)
Out[86]: array([False, True, False, True])
As far as I know, list index first tests for id
, then for ==
. Remember lists store elements by reference
.
In [87]: a.index(np.nan)
Out[87]: 1
In [88]: a.index(mynan)
Out[88]: 3
In [89]: a.index(float('nan'))
Traceback (most recent call last):
File "<ipython-input-89-33bf9e0279e3>", line 1, in <module>
a.index(float('nan'))
ValueError: nan is not in list