I'd like to take a vector and get an array of vectors in which the i-th element of each vector are the k neighbors of the i-th element of the original vector. Also, I'm looking for the fastest way to do so.
I've already done that in MATLAB:
a=zeros(k, length(v));
I=cell(1,k);
a(1,:) = v;
for j=2:k
a(k,:)=[a(k-1,2:end),a(k-1,1)];
end
aux1=[a(:,(end-r 1):end),a(:,1:(end-r))];
for j=1:k
I{k}=aux1(k,:);
end
For example, v = [1, 2, 3, 4, 5] and k = 1; and I want to get:
M = [[5, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 1]]
so that, for the 1st element of each vector, I get [5; 1; 2], which are the element 1 and its neighbors.
Hope it makes sense. Thanks for reading :)
CodePudding user response:
You could use the numpy roll
function:
import numpy as np
def get_neighbors(v, k):
N = len(v)
M = np.zeros((k*2 1, N), dtype=int)
for i in range(-k, k 1):
M[i k, :] = np.roll(v, -i)
return M
v = np.array([1, 2, 3, 4, 5])
k = 1
M = get_neighbors(v, k)
print(M)
Output:
[[5 1 2 3 4]
[1 2 3 4 5]
[2 3 4 5 1]]
CodePudding user response:
Using sliding_window_view
on a repetition of your array can do it "vectorized" way
# Example array
a = np.arange(1,16)
k = 2 # Window of neighbors
# My solution
np.lib.stride_tricks.sliding_window_view(np.hstack([a,a,a]), (len(a),))[len(a)-k:len(a) k 1]
Returns
array([[14, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[15, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1],
[ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1, 2]])
Note that sliding_window_view
creates just a view. It doesn't create new data. Hence the reason why I do not hesitate creating (in this example) 31 lines (3*15-15 1), and then subset only 5 of them: I do not really create them.
So only real cost of that solution is in hstack
, both cpu-wise and memory-wise.
That subset, btw, was done to abide strictly by what you asked. But, depending on what you intend to do, you may drop the subset. Important point is that if
T=np.lib.stride_tricks.sliding_window_view(np.hstack([a,a,a]), (len(a),))
Then T[len(a) k]
is a row made of the kth neighbor, whether k is positive, negative or 0 (the original row)
See timings, since it matters for you
sizes | This method | Roll method |
---|---|---|
len=15/k=2 | 51 μs | 132 μs |
len=15/k=7 | 51 μs | 383 μs |
len=1000/k=7 | 52 μs | 422 μs |
len=1M/k=7 | 6 ms | 160 ms |
len=1M/k=100 | 6 ms | 2.2 s |
Roll method is obviously proportional to the size of the window (O(k) — it has one roll to perform per row of output), when sliding_window_view
is just a view, and does not really create rows, so is O(1)
as far as k is concerned. Both method are equally impacted by len of data (O(n) really, but it shows only for n big enough).
So, all together, this method is O(n) while roll method is O(kn)