Fill an array using the values of another array as the indices. If an index is repeated, prioritize-CodePudding

Description

I have an array a with N integer elements that range from 0 to M-1. I have another array b with N positive numbers.

Then, I want to create an array c with M elements. The i-th element of c should the index of a that has a value of i.

If more than one of these indices existed, then we take the one with a higher value in b.
If none existed, the i-th element of c should be -1.

Example

N = 5, M = 3

a = [2, 1, 1, 2, 2]
b = [1, 3, 5, 7, 3]

Then, c should be...

c = [-1, 5, 7]

My Solution 1

A possible approach would be to initialize an array d that stores the current max and then loop through a and b updating the maximums.

c = -np.ones(M)
d = np.zeros(M)
for i, (idx, val) in enumerate(zip(a, b)):
    if d[idx] <= val:
        c[idx] = i
        d[idx] = val

This solution is O(N) in time but requires iterating the array with Python, making it slow.

My Solution 2

Another solution would be to sort a using b as the key. Then, we can just assign a indices to c (max elements will be last).

sort_idx = np.argsort(b)

a_idx =  np.arange(len(a))
a = a[sort_idx]
a_idx = a_idx[sort_idx]

c = -np.ones(M)
c[a] = a_idx

This solution does not require Python loops but requires sorting b, making it O(N*log(N)).

Ideal Solution

Is there a solution to this problem in linear time without having to loop the array in Python?

CodePudding user response：

AFAIK, this cannot be implemented in O(n) currently with Numpy (mainly because the index table is both read and written regarding the value of another array). Note that np.argsort(b) can theoretically be implemented in O(n) using a radix sort, but such sort is not implemented yet in Numpy (it would not be much faster in practice due to the bad cache locality of the algorithm on big arrays).

One solution is to use Numba to speed up your algorithmically-efficient solution. Numba uses a JIT compiler to speed up loops. Here is an example (working with np.int32 types):

import numpy as np
import numba as nb

@nb.njit('int32[:](int32[:], int32[:])')
def compute(a, b):
    c = np.full(M, -1, dtype=np.int32)
    d = np.zeros(M, dtype=np.int32)
    for i, (idx, val) in enumerate(zip(a, b)):
        if d[idx] <= val:
            c[idx] = i
            d[idx] = val
    return c

a = np.array([2, 1, 1, 2, 2], dtype=np.int32)
b = np.array([1, 3, 5, 7, 3], dtype=np.int32)
c = compute(a, b)