I am looking for an efficient function to find equal elements at distance k inside a list.
Input example:
v=['a','c','d','e','a','c','e','e','e','d']
k=4
Desired output:
['a', 'c', 'e']
why: 'a' is both in position 0 and 4 [4-0=k], 'c' is both in position 1 and 5 [5-1=k], 'e' is both in position 3 and 7 [7-3=k]
If it's possible, I am looking for something more efficient than looping element by element with a for loop. That is, I'm looking for something better than the following:
def dist_k(k, v):
v_len = len(v)
out = []
for i in range(0,v_len,1):
if i k < v_len:
if v[i] == v[i k]:
out.append(v[i])
print(out)
CodePudding user response:
EDIT
I found a clever way to do that even faster, without the need of np.roll
, and that will not consider the array as circular :
>>> import numpy as np
>>> v = ['a','c','d','e','a','c','e','e','e','d']
>>> v_np = np.array(v)
>>> k = 4
>>> v_init = v_np[:-k]
>>> v_init[v_init == v_np[k:]]
array(['a', 'c', 'e'])
The idea is I store the begining of the array without the last k
elements (I will not need then as there are less than k
elements after them). I store the array as I will need it twice.
>>> v_init = v_np[:-k]
>>> v_init
array(['a', 'c', 'd', 'e', 'a', 'c'])
Then I use the second part of the array without the first k
elements:
>>> v_np[k:]
array(['a', 'c', 'e', 'e', 'e', 'd'])
Then I can compare the two arrays:
>>> v_init == v_np[k:]
array([ True, True, False, True, False, False])
And finally I can get the element corresponding to the index from v_init
(not from v
because the length of the arrays do not match):
>>> v_init[v_init == v_np[k:]]
array(['a', 'c', 'e'])
Performances As it uses numpy (which uses C for the computations), this method will be way faster with big arrays. However, due to the conversion to a numpy array, it can be slower for small lists. If you can have the list directly as a numpy array, go for this method it will be faster anyway I think.
def list_comprehension(v, k):
return [v[i] for i in range(len(v)-k) if v[i] == v[i k]]
def using_numpy(v, k):
v_np = np.array(v)
v_init = v[:-k]
return v_init[v_init == v[k:]]
>>> from timeit import timeit
>>> v = np.random.rand(100000).tolist() # I consider that the input is a simple list
>>> timeit(lambda: list_comprehension(v,k), number=1000)
7.875344538999343
>>> timeit(lambda: using_numpy(v,k), number=1000)
3.646597085000394
Original answer, for circular arrays
import numpy as np
v = np.array(['a','c','d','e','a','c','e','e','e','d'])
k = 4
v[v == np.roll(v,-k)]
Basically np.roll
shifts the array by k
elements, then I just compare the initial array with the shifted array, and I keep elements for which they are equal :
>>> np.roll(v, -k)
array(['a', 'c', 'e', 'e', 'e', 'd', 'a', 'c', 'd', 'e'])
>>> v == np.roll(v,-k)
array([ True, True, False, True, False, False, False, False, False,
False])
>>> v[v == np.roll(v,-k)]
array(['a', 'c', 'e'])
Note that this method will consider the array is circular, i.e. it considers the first and last values of the array as adjacent :
>>> v=np.array(['a','c','d','e','a','c','e','e','e','d', 'a'])
>>> v[v == np.roll(v,-k)]
array(['a', 'c', 'e', 'd'])
CodePudding user response:
You can try the following code:
v=['a','c','d','e','a','c','e','e','e','d']
k=4
result = [v[i] for i in range(len(v)-k) if v[i] == v[i k] ]
CodePudding user response:
I don't know if this is any more or less efficient than the other suggestions but it's another option:
v = ['a', 'c', 'd', 'e', 'a', 'c', 'e', 'e', 'e', 'd']
k = 4
print([e for e, f in zip(v, v[k:]) if e == f])
Output:
['a', 'c', 'e']