As part of a more complicated function that I want to compile with
numba, I have to index an array A
with another array idx
.
Importantly, the dimension of the array A
is variable.
The shape can be (N)
, (N,N)
, or (N,N,N)
etc.
In python, I can do so using tuples:
def test():
A = np.arange(5*5*5).reshape(5,5,5)
idx = np.array([0,2,4])
return A[tuple(idx)]
However, indexing an array with a tuple is apparently not supported by numba since I get the following error:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<class 'tuple'>) found for signature:
>>> tuple(array(int64, 1d, C))
Note that this is just a minimal working example. In general, I do not
know the length of idx
.
I thought of reshaping A
to a vector and converting
idx
to a corresponding scalar index.
Is that the best solution or is there a simple "numba alternative" to indexing with a tuple?
CodePudding user response:
Importantly, the dimension of the array A is variable. The shape can be (N), (N,N), or (N,N,N) etc.
The dimension of an array is part of its type in Numba. This means the A
arrays with different dimensions are of different types and thus the function working on it need to be compiled multiple types. Indeed, Numba need to set a well-defined type to all the input/output parameters (and internal variables) of compiled functions. With a variable amount of dimension defined at runtime, you need a variable amount of compiled function and a Python wrapper to pick the right function implementation. The thing is compiling a function is quite expensive and the biggest problem is the indexing of arrays as you pointed out.
In python, I can do so using tuples
This is because Python operates with dynamically-typed objects. But this is also one of the main reason why Python is so slow compared to Numba.
indexing an array with a tuple is apparently not supported by numba
Indexing an array with a tuple should be supported but creating a tuple from a dynamic variable-sized array is not possible in Numba (and will certainly never be). Indeed, the type of a tuple (required at compile time) is basically composed of a list of its items which is dependent of a dynamic property (known only at runtime). For more information about this, please read: Numba Creating a tuple from a list .
Is that the best solution or is there a simple "numba alternative" to indexing with a tuple?
Overall, you should certainly give up the idea of operating with array of variable-dimension using Numba since it is not designed for this. That being said, this does not mean you cannot solve such kind of issue with Numba.
One solution is to operate on flatten arrays and flatten indices. The idea is to flatten A
with A.ravel()
and then give it to Numba. The same can be done with the shape: A.shape
can be passed as a dynamic 1D array to Numba using shape = np.array(A.shape, dtype=np.int)
(performed outside the Numba function). I assume that the array is contiguous for sake of clarity (and sanity). The Numba function can then access to a given item of A
using an expression like:
A[idx[0]*strides[0] idx[1]*strides[1] ... idx[n-1]*strides[n-1]]
Where n = len(shape)
. The strides can be precomputed once using:
strides = np.array(n, dtype=np.int64)
for i in range(n):
strides[i] = 1
for j in range(i 1, n):
strides[i] *= shape[j]
You can compute the flatten index at runtime using a basic loop:
pos = np.int64(0)
for i in range(n):
pos = np.int64(idx[i]) * strides[i]
# A[pos]
Note that doing such an operation is inefficient, especially for contiguous accesses. That being said, dynamic variable-sized arrays are fundamentally inefficient. You can compile the Numba code for some specific n
values though in order to speed up these specific cases (eg. arrays with a small number of dimensions like <=8). For high-order arrays with a non constant dimension, there is AFAIK no fast generic way to do that. You need to operate on (relatively-large) contiguous slices of the flatten array so to help Numba to generate a faster code.