Calculate every 4-element-product in a vector with python-CodePudding

I have a (500000,30) numpy array and we can look it as a length-500000 list of size-30 vectors. I want to choose arbitrary 4 elements in the vector, calculate its product, and store all the 4-element-products. Finally I need to calculate the mean of 500000 results.

I have tried it with np.einsum but it runs really slow. How can I improve the efficiency?

# array.shape = (500000,30)
expect = np.sum(np.einsum('ni,nj,nk,nr->ijkr',array,array,array,array),axis=0)/500000

CodePudding user response：

The sum of all possible products of 4 entries in a vector v is equal to the 4th power of the sum of entries of v. Note that this assumes that the same vector entry can appear more than once in a product, and the sum will include products that differ only by the order of their entries (so e.g. v[1] * v[2] * v[3] * v[4] and v[4] * v[3] * v[2] * v[1] will count as different products). Since your code performs the same computations, I assume that this is what you want. In any case, the value of

np.sum(np.einsum('ni,nj,nk,nr->ijkr', array, array, array, array))

is the same as that of

(array.sum(axis=1)**4).sum()

but the latter will be computed much faster.

In your code you are taking the sum along the 0-axis of the 30x30x30x30 array produced by np.einsum, but I am not sure why.

CodePudding user response：

You can compute the solution much more efficiently by factorizing the dot products in the last dimension. Moreover, you can tell Numpy to optimize the einsum (at the expense of a higher latency which is not a problem here). Here is the resulting code:

expect = np.einsum('n,nj,nk,nr->jkr',np.sum(array, axis=1),array,array,array,optimize=True)/500000

This is 63 times faster on my machine. If you want to optimize this further, then you can perform the computation in parallel using multiple threads. Indeed, the default Numpy implementation is sequential. You can use Numba to do that. I expect the computation to be about 360 times faster on my 6-core machine (since the computation is compute-bound).