How to vectorize such an algorithm in python?-CodePudding

I have four (nx1) dimensional arrays named a, b, c, and F. I want to run this algorithm without any loops.

for i in range(n):
    if a[i] < b[i]:
        F[i] = 0
    elif a[i] > c[i]:
        F[i] = 1
    elif b[i] <= a[i] <= c[i]:
        F[i] = 2

I want to write this code in a more vectorized way to make it more efficient as my dataset is quite large.

CodePudding user response：

I feel like you could use boolean indexing for this task.

F[np.logical_and(b <= a, a <= c)] = 2
F[a > c] = 1
F[a < b] = 0

Beware, the affectation order is important here to get the desired outcomes.

Some timeit benchmark:

def loop(F, a, b, c):
  for i in range(F.shape[0]):
    if a[i] < b[i]:
      F[i] = 0
    elif a[i] > c[i]:
      F[i] = 1
    elif b[i] <= a[i] <= c[i]:
      F[i] = 2

def idx(F, a, b, c):
  F[np.logical_and(b <= a, a <= c)] = 2
  F[a > c] = 1
  F[a < b] = 0

with (10x1) array:

>>> timeit.timeit(lambda: loop(F, a, b, c))
11.585818066001593
>>> timeit.timeit(lambda: idx(F, a, b, c))
3.337863392000145

with (1000x1) array:

>>> timeit.timeit(lambda: loop(F, a, b, c))
1457.268110728999
>>> timeit.timeit(lambda: idx(F, a, b, c))
10.00236530300026

CodePudding user response：

If you care about performance, why not try numba? It might get 10X faster than logical operations while saving memory at the same time. As a bonus, the loop code you wrote will be kept intact, only through an @njit decorator in front of the function.

from numba import njit

@njit
def loop(F, a, b, c):
  for i in range(F.shape[0]):
    if a[i] < b[i]:
      F[i] = 0
    elif a[i] < c[i]:
      F[i] = 1
    elif b[i] <= a[i] <= c[i]:
      F[i] = 2

Compare with vectorized solution by @NiziL using sizes of 100 and 1000 vectors,

timeit(lambda: loop(F, a, b, c))
timeit(lambda: idx(F, a, b, c))

Gives:

# 1.0355658 (Size: 100, @njit loop)
# 4.6863165 (Size: 100, idx)

# 1.9563843 (Size: 1000, @njit loop)
# 16.658198 (Size: 1000, idx)