I have a big array of integers and second array of arrays. I want to create a boolean mask for the first array based on data from the second array of arrays. Preferably I would use the numpy.isin
but it clearly states in it's documentation:
The values against which to test each value of element. This argument is flattened if it is an array or array_like. See notes for behavior with non-array-like parameters.
Do you maybe know some performant way of doing this instead of list comprehension?
So for example having those arrays:
a = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
b = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
I would like to have result like:
np.array([
[True, True, False, False, False, False, False, False, False, False],
[False, False, True, True, False, False, False, False, False, False],
[False, False, False, False, True, True, False, False, False, False],
[False, False, False, False, False, False, True, True, False, False],
[False, False, False, False, False, False, False, False, True, True]
])
CodePudding user response:
Try numpy.apply_along_axis
to work with numpy.isin
:
np.apply_along_axis(lambda x: np.isin(a, x), axis=1, arr=b)
returns
array([[[ True, True, False, False, False, False, False, False, False, False]],
[[False, False, True, True, False, False, False, False, False, False]],
[[False, False, False, False, True, True, False, False, False, False]],
[[False, False, False, False, False, False, True, True, False, False]],
[[False, False, False, False, False, False, False, False, True, True]]])
I will update with an edit comparing the runtime with a list comp
EDIT:
Whelp, I tested the runtime, and wouldn't you know, listcomp is faster
timeit.timeit("[np.isin(a,x) for x in b]",number=10000, globals=globals())
0.37380070000654086
vs
timeit.timeit("np.apply_along_axis(lambda x: np.isin(a, x), axis=1, arr=b) ",number=10000, globals=globals())
0.6078917000122601
the other answer to this post by @mozway is much faster:
timeit.timeit("(a == b[...,None]).any(-2)",number=100, globals=globals())
0.007107900004484691
and should probably be accepted.
CodePudding user response:
You can use broadcasting to avoid any loop (this is however more memory expensive):
(a == b[...,None]).any(-2)
Output:
array([[ True, True, False, False, False, False, False, False, False, False],
[False, False, True, True, False, False, False, False, False, False],
[False, False, False, False, True, True, False, False, False, False],
[False, False, False, False, False, False, True, True, False, False],
[False, False, False, False, False, False, False, False, True True]])