Home > Mobile >  Numpy: Can't search for tuple in list
Numpy: Can't search for tuple in list

Time:05-05

I am trying to search for a tuple in a list

import numpy as np
l = [(0, 0), (1, 1), (3, 4)]
print(np.where(l == (0, 0)))
 >>>> (array([], dtype=int64),)

It's not possible for some reason. It's weird to me, because l[0] returns (0, 0)

CodePudding user response:

TL;DR

To achive what you seems you are after with vectorized code you need to cast l (or (0, 0)) to a NumPy array and use np.all(..., axis=1):

import numpy as np
l = [(0, 0), (1, 1), (3, 4)]
print(np.nonzero(np.all(np.array(l) == (0, 0), axis=1))[0])
# [0]

You should probably take a step back and try to understand the single pieces of your code.

  1. Your input is a list. The equal operator == on a list act on the whole list will give you either True or False depending on whether the two lists contain the same elements.
  2. np.where() without further options (which should be avoided in favor of np.nonzero()) will look for non-zero entries in your array and return the indices of such entries.

Hence, np.where() on a scalar boolean will return either (array([0]),) or (array([], dtype=int64),) for True or False respectively.

If you were to use == on a NumPy array the result would still be different from what you expected, essentially because == will work element-wise in this case (eventually broadcasting the arguments). For example:

a = np.array([[0, 0], [1, 1], [2, 2]])
b = np.array([0, 1])
a == b
# [[ True False]
#  [False  True]
#  [False False]]

To obtain what you are after with vectorized code you would need to find the rows where the values are all True, effectively collapsing/reducing one dimension/axis with np.all(..., axis=...), e.g.:

a = np.array([[0, 0], [1, 1], [2, 2]])
c = np.array([0, 0])
np.all(a == c, axis=1)
# [ True, False, False]

which would essentially check that all elements in along axis 1 are True.

Now, you can use np.nonzero() to find the indices:

np.nonzero(np.all(a == c, axis=1))[0]
# [0]

CodePudding user response:

If you do not need to work with NumPy, a simpler approach is to just loop through the list.

Assuming the input as:

l = [(0, 0), (1, 1), (3, 4)] * 2
x = (0, 0)

One such implementation could be:

def search_all(items, x):
    for i, item in enumerate(items):
        if item == x:
            yield i


print(list(search_all(l, x))
# [0, 3]

If the list is sufficiently sparse, it may be faster to loop through the list using the list.index() method, which forces some of the iteration through faster pathways. This is essentially implemented in FlyingCircus index_all().

def index_all(seq, item):
    i = 0
    try:
        while True:
            i = seq.index(item, i)
            yield i
            i  = 1
    except ValueError:
        return


print(list(index_all(l, x))
# [0, 3]

Both methods above are very memory efficient, and reasonably fast. For faster implementations, one could use Cython to accelerate the explicit looping.

As @KellyBundy suggests, it is even possible to do all the looping implicitly:

import itertools
import operator


def search_all_iter(seq, item):
    return itertools.compress(
        itertools.count(),
        map(operator.eq, seq, itertools.repeat(item)))


print(list(search_all_iter(l, x))
# [0, 3]

If the inputs are NumPy array (and hence the conversion to NumPy array is not needed), and we can start from al/ax:

import numpy as np


al = np.array(l)
ax = np.array(x)

then vectorized approaches (which however require reasonably large temporary objects) such as find_index_np() (borrowed from here) or find_index_unique() (borrowed from there) can be devised:

def find_index_np(arr, subarr):
    return np.nonzero(np.all(arr == subarr, axis=1))[0]


print(find_index_np(al, ax))
# [0 3]
def find_index_unique(arr, subarr):
    return np.unique(np.array(list(zip(*np.nonzero(arr == subarr))))[:, :1])


print(find_index_unique(al, ax))
# [0 3]

Alternatively, it may be possible to write memory efficient and fast approaches working well with NumPy arrays using Numba acceleration:

@nb.njit
def all_eq(a, b):
    for x, y in zip(a, b):
        if x != y:
            return False
    return True


@nb.njit
def find_index_nb(arr, subarr):
    result = np.empty(arr.size, dtype=np.int_)
    j = 0
    for i, x in enumerate(arr):
        if all_eq(x, subarr):
            result[j] = i
            j  = 1
    return result[:j].copy()


print(find_index_np(al, ax))
# [0 3]

Finally, please find some benchmarks for the above:

l = ([(0, 0)]   [(1, 1)] * 9) * 1000
x = (0, 0)
al = np.array(l)
ax = np.array(x)


%timeit list(index_all(l, x))
# 1000 loops, best of 5: 500 µs per loop
%timeit list(search_all(l, x))
# 1000 loops, best of 5: 911 µs per loop
%timeit list(search_all_iter(l, x))
# 1000 loops, best of 5: 701 µs per loop
%timeit find_index_np(al, ax)
# 1000 loops, best of 5: 247 µs per loop
%timeit find_index_unique(al, ax)
# 1000 loops, best of 5: 1.41 ms per loop
%timeit find_index_nb(al, ax)
# 1000 loops, best of 5: 465 µs per loop

Note that all these approaches are O(N) in time complexity (their computation time increases linearly with the size of the input), but their relative speed depends heavily also on how often the item being searched is found, e.g. if we increase the occurrences of x/ax from one every ten to one every two then index_all() becomes slower than search_all():

l = ([(0, 0)]   [(1, 1)]) * 5000
x = (0, 0)
al = np.array(l)
ax = np.array(x)


%timeit list(index_all(l, x))
# 1000 loops, best of 5: 1.22 ms per loop
%timeit list(search_all(l, x))
# 1000 loops, best of 5: 1.01 ms per loop
%timeit list(search_all_iter(l, x))
# 1000 loops, best of 5: 666 µs per loop
%timeit find_index_np(al, ax)
# 1000 loops, best of 5: 250 µs per loop
%timeit find_index_unique(al, ax)
# 100 loops, best of 5: 6.91 ms per loop
%timeit find_index_nb(al, ax)
# 1000 loops, best of 5: 483 µs per loop

CodePudding user response:

First things first, if we want to use numpy, let's ensure that we are working on a proper ndarray:

#I add some more data for test purposes
l = np.array([(0, 0), (1, 1), (3, 4), (3, 6), (0,0)]) 

> array([[0, 0],
         [1, 1],
         [3, 4],
         [3, 6],
         [0, 0]])

Afterwards, I recommend using np.nonzero in order to find all the indices that respect a certain condition, the problem is that it seems like numpy is unpacking the tuple and giving me indices to each of the tuples' elements if the tuple respects the condition, to circumvent this, we just pick the first dimension of the index and apply np.unique to get rid of the duplicates. (I suspect there is a much more elegant solution that I am obviously missing).

the final result is:

np.unique(np.array(list(zip(*np.nonzero(l == [0,0]))))[:,0:1])

> array([0, 4], dtype=int64)

do note that, as commenters have suggested, the solution is probably more readable in a list comprehension format:

#We avoid using an ndarray since we will be using vanilla python anyway
l = [(0, 0), (1, 1), (3, 4), (3, 6), (0,0)]
l_indx = [i for i, tup in enumerate(l) if tup == (0,0)]

> [0, 4]

CodePudding user response:

you could try the following :

l = [(0,0), (1,1), (2,2)]

def in_list(l, a):
    try:
        return l.index(a)
    except:
        return False

print(in_list(l, (0,0)))

>>>0

print(in_list(l, (0,10)))
>>>False

which gives you back the index in the list, if there, otherwise False (or anything else, which would be appropriate in your case).

  • Related