Is it possible to find the 0th index-position of a 2D numpy array (not) containing a given vaule?

What I want, and expect

I have a 2D numpy array containing integers. My goal is to find the index of the array(s) that do not contain a given value (using numpy functions). Here is an example of such an array, named ortho_disc:

>>> ortho_disc 
Out: [[1 1 1 0 0 0 0 0 0]
      [1 0 1 1 0 0 0 0 0]
      [0 0 0 0 0 0 2 2 0]]

If I wish to find the arrays not containing 2, I would expect an output of [0, 1], as the first and second array of ortho_disc does not contain the value 2.

What I have tried

I have looked into np.argwhere, np.nonzero, np.isin and np.where without expected results. My best attempt using np.where was the following:

>>> np.where(2 not in ortho_disc, [True]*3, [False]*3) 
Out: [False False False]

But it does not return the expected [True, True, False]. This is especially weird after we look at the output ortho_disc's arrays evaluated by themselves:

>>> 2 not in ortho_disc[0] 
Out: True

>>> 2 not in ortho_disc[1] 
Out:True

>>> 2 not in ortho_disc[2]
Out: False

Using argwhere

Using np.argwhere, all I get is an empty array (not the expected [0, 1]):

>>> np.argwhere(2 not in ortho_disc) 
Out: []

I suspect this is because numpy first flattens ortho_disc, then checks the truth-value of 2 not in ortho_disc? The same empty array is returned using np.nonzero(2 not in ortho_disc).

My code

import numpy as np
ortho_disc = np.array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
                       [1, 0, 1, 1, 0, 0, 0, 0, 0],
                       [0, 0, 0, 0, 0, 0, 2, 2, 0,]])
polymer = 2

print(f'>>> ortho_disc \nOut:\n{ortho_disc}\n')
print(f'>>> {polymer} not in {ortho_disc[0]} \nOut: {polymer not in ortho_disc[0]}\n')
print(f'>>> {polymer} not in {ortho_disc[1]} \nOut: {polymer not in ortho_disc[1]}\n')
print(f'>>> {polymer} not in {ortho_disc[2]} \nOut: {polymer not in ortho_disc[2]}\n\n')

breakpoint = np.argwhere(polymer not in ortho_disc)
print(f'>>>np.argwhere({polymer} not in ortho_disc) \nOut: {breakpoint}\n\n\n')

Output:

>>> ortho_disc 
Out:
[[1 1 1 0 0 0 0 0 0]
 [1 0 1 1 0 0 0 0 0]
 [0 0 0 0 0 0 2 2 0]]

>>> 2 not in [1 1 1 0 0 0 0 0 0] 
Out: True

>>> 2 not in [1 0 1 1 0 0 0 0 0] 
Out: True

>>> 2 not in [0 0 0 0 0 0 2 2 0] 
Out: False


>>>np.argwhere(2 not in ortho_disc) 
Out: []

Expected output

From the bottom two lines:

breakpoint = np.argwhere(polymer not in ortho_disc)
print(f'>>>np.argwhere({polymer} not in ortho_disc) \nOut: {breakpoint}\n\n\n')

I excpect the following output:

>>>np.argwhere(2 not in ortho_disc) 
Out: [0, 1]

Summary

I would really love feedback on how to solve this issue, as I have been scratching my head over what seems to be an easy problem for ages. And as I mentioned it is important to avoid the obvious 'easy-way-out' loop over ortho_disc, preferably using numpy.

Thanks in advance!

CodePudding user response：

In [13]: ortho_disc
Out[13]: 
array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
       [1, 0, 1, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 2, 2, 0]])

In [14]: polymer = 2

In [15]: (ortho_disc != polymer).all(axis=1).nonzero()[0]
Out[15]: array([0, 1])

Breaking it down: ortho_disc != polymer is an array of bools:

In [16]: ortho_disc != polymer
Out[16]: 
array([[ True,  True,  True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True, False, False,  True]])

We want the rows that are all True; for that, we can apply the all() method along axis 1 (i.e. along the rows):

In [17]: (ortho_disc != polymer).all(axis=1)
Out[17]: array([ True,  True, False])

That's the boolean mask for the rows that do not contain polymer.

Use nonzero() to find the indices of the values that are not 0 (True is considered nonzero, False is 0):

In [19]: (ortho_disc != polymer).all(axis=1).nonzero()
Out[19]: (array([0, 1]),)

Note that nonzero() returned a tuple with length 1; in general, it returns a tuple with the same length as the number of dimensions of the array. Here the input array is 1-d. Pull out the desired result from the tuple by indexing with [0]:

In [20]: (ortho_disc != polymer).all(axis=1).nonzero()[0]
Out[20]: array([0, 1])

CodePudding user response：

You can use numpy broadcasting for this. ortho_disc == 2 will return a mask of the array where each value is True if that value in the array was not 2, False if it was 2. Then, use np.all with axis=1 to condense each row into a boolean indicating whether or not that row contained only True values (True value = no 2 there):

>>> np.all(ortho_disc != 2, axis=1)
array([ True,  True, False])

If you want to get the indexes out that, just np.where with the above:

>>> np.where(np.all(ortho_disc != 2, axis=1))[0]
array([0, 1])