Is it possible to find the 0th index-position of a 2D numpy array (not) containing a given vaule?
What I want, and expect
I have a 2D numpy array containing integers. My goal is to find the index of the array(s) that do not contain a given value (using numpy functions). Here is an example of such an array, named ortho_disc
:
>>> ortho_disc
Out: [[1 1 1 0 0 0 0 0 0]
[1 0 1 1 0 0 0 0 0]
[0 0 0 0 0 0 2 2 0]]
If I wish to find the arrays not containing 2, I would expect an output of [0, 1]
, as the first and second array of ortho_disc
does not contain the value 2.
What I have tried
I have looked into np.argwhere
, np.nonzero
, np.isin
and np.where
without expected results. My best attempt using np.where
was the following:
>>> np.where(2 not in ortho_disc, [True]*3, [False]*3)
Out: [False False False]
But it does not return the expected [True, True, False]
. This is especially weird after we look at the output ortho_disc
's arrays evaluated by themselves:
>>> 2 not in ortho_disc[0]
Out: True
>>> 2 not in ortho_disc[1]
Out:True
>>> 2 not in ortho_disc[2]
Out: False
Using argwhere
Using np.argwhere
, all I get is an empty array (not the expected [0, 1]
):
>>> np.argwhere(2 not in ortho_disc)
Out: []
I suspect this is because numpy first flattens ortho_disc
, then checks the truth-value of 2 not in ortho_disc
?
The same empty array is returned using np.nonzero(2 not in ortho_disc)
.
My code
import numpy as np
ortho_disc = np.array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 0,]])
polymer = 2
print(f'>>> ortho_disc \nOut:\n{ortho_disc}\n')
print(f'>>> {polymer} not in {ortho_disc[0]} \nOut: {polymer not in ortho_disc[0]}\n')
print(f'>>> {polymer} not in {ortho_disc[1]} \nOut: {polymer not in ortho_disc[1]}\n')
print(f'>>> {polymer} not in {ortho_disc[2]} \nOut: {polymer not in ortho_disc[2]}\n\n')
breakpoint = np.argwhere(polymer not in ortho_disc)
print(f'>>>np.argwhere({polymer} not in ortho_disc) \nOut: {breakpoint}\n\n\n')
Output:
>>> ortho_disc
Out:
[[1 1 1 0 0 0 0 0 0]
[1 0 1 1 0 0 0 0 0]
[0 0 0 0 0 0 2 2 0]]
>>> 2 not in [1 1 1 0 0 0 0 0 0]
Out: True
>>> 2 not in [1 0 1 1 0 0 0 0 0]
Out: True
>>> 2 not in [0 0 0 0 0 0 2 2 0]
Out: False
>>>np.argwhere(2 not in ortho_disc)
Out: []
Expected output
From the bottom two lines:
breakpoint = np.argwhere(polymer not in ortho_disc)
print(f'>>>np.argwhere({polymer} not in ortho_disc) \nOut: {breakpoint}\n\n\n')
I excpect the following output:
>>>np.argwhere(2 not in ortho_disc)
Out: [0, 1]
Summary
I would really love feedback on how to solve this issue, as I have been scratching my head over what seems to be an easy problem for ages. And as I mentioned it is important to avoid the obvious 'easy-way-out' loop over ortho_disc
, preferably using numpy.
Thanks in advance!
CodePudding user response:
In [13]: ortho_disc
Out[13]:
array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 0]])
In [14]: polymer = 2
In [15]: (ortho_disc != polymer).all(axis=1).nonzero()[0]
Out[15]: array([0, 1])
Breaking it down: ortho_disc != polymer
is an array of bools:
In [16]: ortho_disc != polymer
Out[16]:
array([[ True, True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, False, False, True]])
We want the rows that are all True; for that, we can apply the all()
method along axis 1 (i.e. along the rows):
In [17]: (ortho_disc != polymer).all(axis=1)
Out[17]: array([ True, True, False])
That's the boolean mask for the rows that do not contain polymer
.
Use nonzero()
to find the indices of the values that are not 0 (True is considered nonzero, False is 0):
In [19]: (ortho_disc != polymer).all(axis=1).nonzero()
Out[19]: (array([0, 1]),)
Note that nonzero()
returned a tuple with length 1; in general, it returns a tuple with the same length as the number of dimensions of the array. Here the input array is 1-d. Pull out the desired result from the tuple by indexing with [0]
:
In [20]: (ortho_disc != polymer).all(axis=1).nonzero()[0]
Out[20]: array([0, 1])
CodePudding user response:
You can use numpy broadcasting for this. ortho_disc == 2
will return a mask of the array where each value is True if that value in the array was not 2
, False if it was 2
. Then, use np.all
with axis=1
to condense each row into a boolean indicating whether or not that row contained only True values (True value = no 2
there):
>>> np.all(ortho_disc != 2, axis=1)
array([ True, True, False])
If you want to get the indexes out that, just np.where
with the above:
>>> np.where(np.all(ortho_disc != 2, axis=1))[0]
array([0, 1])