I have 3 arrays, x
, y
, and q
. Arrays x
and y
have the same length, q
is a query array. Assume all values in x
and q
are unique. For each value of q
, I would like to find the index of the corresponding value in x
. I would then like to query that index in y
. If a value from q
does not appear in x
, I would like to return np.nan
.
As a concrete example, consider the following arrays:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
q = np.array([2, 0])
Since only the value 2 occurs in x
, the correct return value would be:
out = np.array([5, np.nan])
With for loops, this can be done like so:
out = []
for i in range(len(q)):
for j in range(len(x)):
if np.allclose(q[i], x[j]):
out.append(y[j])
break
else:
out.append(np.nan)
output = np.array(out)
Obviously this is quite slow. Is there a simpler way to do this with numpy builtins like np.argwhere
? Or would it be easier to use pandas?
CodePudding user response:
I think you could solve this in one line but using one for, and some broadcasting:
out = [y[bl].item() if bl.any() else None for bl in x[None,:]==q[:,None] ]
seems to me an elegant solution but a little confusing to read. I will go part by part.
x[None,:]==q[:,None]
compares every value in q with every in x and returns(len(q),len(x)
array of booleans (in this case will be[[False,True,False], [False,False,False]]
- you can index y with a boolean array with same length
len(y)
. so you could cally[ [False,True,False] ]
to get the value of y[1]. - If the bool array contains all false then you have to put a
None
so that's why to use theif-else
CodePudding user response:
Here is how to use np.argwhere too. Use a more comfortable one, Pandas or numpy.
out_idx = [y[np.argwhere(x==value).reshape(-1)] for value in q]
out = [x[0] if len(x) else np.nan for x in out_idx]
CodePudding user response:
Here's a way to do what your question asks:
query_results = pd.DataFrame(index=q).join(pd.DataFrame({'y':y}, index=x)).T.to_numpy()[0]
Output:
[ 5. nan]
CodePudding user response:
Numpy broadcasting should work.
# a mask that flags any matches
m = q == x[:, None]
# replace any value in q without any match in x by np.nan
res = np.where(m.any(0), y[:, None] * m, np.nan).sum(0)
res
# array([ 5., nan])
I should note that this only works if x
has no duplicates.
Because it relies on building a len(x) x len(q)
array, if q
is large, the above solution will run into memory issues. Another pandas solution will work much more efficiently in that case:
# map q to y via x
res = pd.Series(q).map(pd.Series(y, index=x)).values