Home > Enterprise >  Python find index of nan in nan-list yields error only sometimes?
Python find index of nan in nan-list yields error only sometimes?

Time:09-24

For an all-nan list a = [np.nan, np.nan], a.index(np.nan) returns 0, while for the np.nan returned by b = np.nanmax(a), a.index(b) gives a ValueError. The object ids of np.nan and b are different. However, if a were [2,3.1] and c = np.array(a).tolist(), then id(a[1]) and id(c[1]) would be different as well, nonetheless there is no ValueError for a.index(c[1])?

How does list.index() work under the hood? Does it compare for value equality (I guess not, otherwise a.index(np.nan) should return an error because np.nan != np.nan)? For object id (again I guess not, otherwise a.index(c[1]) should return an error)? Why does the example with a.index(np.nanmax(a)) not work if a = [np.nan,np.nan], while a.index(np.nan) does?

import numpy as np

a = [np.nan, np.nan]
b = np.nanmax(a)

print(id(np.nan), id(a[0]), id(a[1]), id(b))

a.index(np.nan)
a.index(b)

# Output:
# 47021195940144 47021195940144 47021195940144 47021566155984
#   ...
#   File "<ipython-input-2-fb7cc8fa88c0>", line 9, in <module>
#     a.index(b)
# ValueError: nan is not in list

CodePudding user response:

Implementation of list.index

If you wanna see how index is implemented (in C) you can look here
To make it easier to understand I rewrote that in python:

import sys


def index(self, value, start=0, stop=sys.maxsize, /):
    # make sure that start and end are in boundaries
    if start < 0:
        start  = len(self)
        if start < 0:
            start = 0
    if stop < 0:
        stop  = len(self)
        if stop < 0:
            stop = 0

    # iterate throughout list and try to find the value
    for i, obj in enumerate(self[start:stop]):
        if obj is value or obj == value:
            return i

    raise ValueError("%r is not in list" % value)

Details of why the implementation is like that

To understand this part I would suggest to you to read the implementation which I referenced earlier

All magic is happening in the PyObject_RichCompareBool:
if it's called like in the index then it behaves like x is y or x == y

This fact also stated in the docs (index uses Py_EQ)

int PyObject_RichCompareBool(PyObject *o1, PyObject *o2, int opid)

Compare the values of o1 and o2 using the operation specified by opid, which must be one of Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT, or Py_GE, corresponding to <, <=, ==, !=, >, or >= respectively. Returns -1 on error, 0 if the result is false, 1 otherwise. This is the equivalent of the Python expression o1 op o2, where op is the operator corresponding to opid.

Note If o1 and o2 are the same object, PyObject_RichCompareBool() will always return 1 for Py_EQ and 0 for Py_NE.

Case with -1 is handled by python, we don't need to worry about it. (python raises exception and automatically stops running our code)

So how does it work?

In the end if we apply our knoledge then we can see the reason why the behaviour is like that:

import numpy as np

instance1 = np.nan

l = [instance1]
instance2 = np.nanmax(l)  # RuntimeWarning: All-NaN axis encountered

print(instance1 is instance2 or instance1 == instance2)
# False therefore ValueError
import numpy as np

instance1 = 3.1

l = [instance1]
instance2 = np.array(l).tolist()[0]

print(instance1 is instance2 or instance1 == instance2)
# True (instance1 == instance2) therefore no ValueError

Additionally

Also here are your generalized examples:

import numpy as np

instance1 = np.nan

l = [instance1]
instance2 = np.nanmax(l)  # RuntimeWarning: All-NaN axis encountered

assert instance1 is l[0]
assert instance1 is not instance2

assert not l.index(instance1)
assert not l.index(instance2)  # ValueError: nan is not in list

and

import numpy as np

instance1 = 3.1

l = [instance1]
instance2 = np.array(l).tolist()[0]

assert instance1 is l[0]
assert instance1 is not instance2

assert not l.index(instance1)
assert not l.index(instance2)  # no ValueError

CodePudding user response:

In python you can make a nan valued object with:

In [80]: mynan=float('nan')
In [81]: id(mynan)
Out[81]: 139640449759024

Make another and get a different id:

In [82]: mynan=float('nan')
In [83]: id(mynan)
Out[83]: 139640449757264

numpy has its own version:

In [84]: id(np.nan)
Out[84]: 139640952170000

I think that always gives the same id (in a particular session)

Make a list:

In [85]: a = [.1, np.nan, .3, mynan]

np.isnan can test for nan values even where id and value don't work:

In [86]: np.isnan(a)
Out[86]: array([False,  True, False,  True])

As far as I know, list index first tests for id, then for ==. Remember lists store elements by reference.

In [87]: a.index(np.nan)
Out[87]: 1
In [88]: a.index(mynan)
Out[88]: 3
In [89]: a.index(float('nan'))
Traceback (most recent call last):
  File "<ipython-input-89-33bf9e0279e3>", line 1, in <module>
    a.index(float('nan'))
ValueError: nan is not in list
  • Related