Efficient function to detect equal elements at distance k-CodePudding

I am looking for an efficient function to find equal elements at distance k inside a list.

Input example:

v=['a','c','d','e','a','c','e','e','e','d']
k=4

Desired output:

['a', 'c', 'e']

why: 'a' is both in position 0 and 4 [4-0=k], 'c' is both in position 1 and 5 [5-1=k], 'e' is both in position 3 and 7 [7-3=k]

If it's possible, I am looking for something more efficient than looping element by element with a for loop. That is, I'm looking for something better than the following:

def dist_k(k, v):
    v_len = len(v)
    out = []
    for i in range(0,v_len,1):
        if i k < v_len:
            if v[i] == v[i k]:
                out.append(v[i])
    print(out)

CodePudding user response：

EDIT

I found a clever way to do that even faster, without the need of np.roll, and that will not consider the array as circular :

>>> import numpy as np
>>> v = ['a','c','d','e','a','c','e','e','e','d']
>>> v_np = np.array(v)
>>> k = 4
>>> v_init = v_np[:-k]
>>> v_init[v_init == v_np[k:]]
array(['a', 'c', 'e'])

The idea is I store the begining of the array without the last k elements (I will not need then as there are less than k elements after them). I store the array as I will need it twice.

>>> v_init = v_np[:-k]
>>> v_init
array(['a', 'c', 'd', 'e', 'a', 'c'])

Then I use the second part of the array without the first k elements:

>>> v_np[k:]
array(['a', 'c', 'e', 'e', 'e', 'd'])

Then I can compare the two arrays:

>>> v_init == v_np[k:]
array([ True,  True, False,  True, False, False])

And finally I can get the element corresponding to the index from v_init (not from v because the length of the arrays do not match):

>>> v_init[v_init == v_np[k:]]
array(['a', 'c', 'e'])

Performances As it uses numpy (which uses C for the computations), this method will be way faster with big arrays. However, due to the conversion to a numpy array, it can be slower for small lists. If you can have the list directly as a numpy array, go for this method it will be faster anyway I think.

def list_comprehension(v, k):
    return [v[i] for i in range(len(v)-k) if v[i] == v[i k]]

def using_numpy(v, k):
    v_np = np.array(v)
    v_init = v[:-k]
    return v_init[v_init == v[k:]]

>>> from timeit import timeit
>>> v = np.random.rand(100000).tolist()  # I consider that the input is a simple list
>>> timeit(lambda: list_comprehension(v,k), number=1000)
7.875344538999343
>>> timeit(lambda: using_numpy(v,k), number=1000)
3.646597085000394

Original answer, for circular arrays

import numpy as np
v = np.array(['a','c','d','e','a','c','e','e','e','d'])
k = 4
v[v == np.roll(v,-k)]

Basically np.roll shifts the array by k elements, then I just compare the initial array with the shifted array, and I keep elements for which they are equal :

>>> np.roll(v, -k)
array(['a', 'c', 'e', 'e', 'e', 'd', 'a', 'c', 'd', 'e'])
>>> v == np.roll(v,-k)
array([ True,  True, False,  True, False, False, False, False, False,
       False])
>>> v[v == np.roll(v,-k)]
array(['a', 'c', 'e'])

Note that this method will consider the array is circular, i.e. it considers the first and last values of the array as adjacent :

>>> v=np.array(['a','c','d','e','a','c','e','e','e','d', 'a'])
>>> v[v == np.roll(v,-k)]
array(['a', 'c', 'e', 'd'])

CodePudding user response：

You can try the following code:

    v=['a','c','d','e','a','c','e','e','e','d']
    k=4 

    result = [v[i] for i in range(len(v)-k) if v[i] == v[i k] ]

CodePudding user response：

I don't know if this is any more or less efficient than the other suggestions but it's another option:

v = ['a', 'c', 'd', 'e', 'a', 'c', 'e', 'e', 'e', 'd']
k = 4
print([e for e, f in zip(v, v[k:]) if e == f])

Output:

['a', 'c', 'e']