Most computationally efficient way to get average of particular pairs of rows, and concatenate all o-CodePudding

I have a sample array

import numpy as np

a = np.array(
    [
     [1, 2, 3],
     [4, 5, 6],
     [7, 8, 9],
     [10, 11, 12],
     [13, 14, 15],
    ]
)

And an array of indices for which I would like to get averages from

b = np.array([[1,3], [1,2], [2,3]])

In addition, I need the final result to have the first row concatenated to each of these averages

I can get the desired result using this

np.concatenate( (np.tile(a[0],(3,1)), a[b].mean(1)), axis=1)

array([[ 1. ,  2. ,  3. ,  7. ,  8. ,  9. ],
       [ 1. ,  2. ,  3. ,  5.5,  6.5,  7.5],
       [ 1. ,  2. ,  3. ,  8.5,  9.5, 10.5]])

I am wondering if there is a more computationally efficient way, as I've heard concatenate is slow

Numpy concatenate is slow: any alternative approach?

I'm thinking there might be a way with a combinatin of advanced indexing, .mean(), and reshape, but I am not able to come up with anything that gives the desired array.

CodePudding user response：

The problem is not that concatenate is slow. In fact, it is not so slow. The problem is to use it in a loop so to produce a growing array. This pattern is very inefficient because it produces many temporary array and copies. However, in your case you do not use such a pattern so this is fine. Here, concatenate is properly used and perfectly match with your intent. You could create an array and fill the left and the right part separately, but this is what concatenate should do in the end. That being said, concatenate has a quite big overhead mainly for small arrays (like most Numpy functions) because of many internal checks** (so to adapt its behaviour regarding the shape of the input arrays). Moreover, the implicit casting from np.int_ to np.float64 of np.tile(a[0],(3,1)) introduces another overhead. Moreover, note that mean is not very optimized for such a case. It is faster to use (a[b[:,0]] a[b[:,1]]) * 0.5 although the intent is less clear.

n, m = a.shape[1], b.shape[0]
res = np.empty((n, m*2), dtype=np.float64)
res[:,m] = a[0]                            # Note: implicit conversion done here
res[:,m:] = (a[b[:,0]]   a[b[:,1]]) * 0.5  # Also here

The resulting operation is about 3 times faster on my machine with your example. It may not be the case for big input arrays (although I expect a speed up too).

For big arrays, the best solution is to use a Numba (or Cython) code with loops so to avoid the creation/filling of big expensive temporary arrays. Numba should also speed up the computation of small arrays because it mostly removes the overhead of Numpy functions (I expect a speed up of about 5x-10x here).