Vectorise a function for 2D numpy array-CodePudding

I would like to calculate the log-ratios for my 2D array, e.g.

a = np.array([[3,2,1,4], [2,1,1,6], [1,5,9,1], [7,8,2,2], [5,3,7,8]])

The formula is ln(x/g(x)), where g(x) is the geometric mean of each row. I execute it like this:

    logvalues = np.array(a) # the values will be overwritten through the code below.
    for i in range(len(a)):
        row = np.array(a[i])
        geo_mean = row.prod()**(1.0/len(row))
        flr = lambda x: math.log(x/geo_mean)
        logvalues = np.array([flr(x) for x in row])

I was wondering if there is any way to vectorise the above lines (preferably without introducing other modules) to make it more efficient?

CodePudding user response：

This should do the trick:

geo_means = a.prod(1)**(1/a.shape[1])
logvalues = np.log(a/geo_means[:, None])

CodePudding user response：

Another way you could do this is just write the function as though for a single 1-D array, ignoring the 2-D aspect:

def f(x):
    return np.log(x / x.prod()**(1.0 / len(x)))

Then if you want to apply it to all rows in a 2-D array (or N-D array):

>>> np.apply_along_axis(f, 1, a)
array([[ 0.30409883, -0.10136628, -0.79451346,  0.5917809 ],
       [ 0.07192052, -0.62122666, -0.62122666,  1.17053281],
       [-0.95166562,  0.65777229,  1.24555895, -0.95166562],
       [ 0.59299864,  0.72653003, -0.65976433, -0.65976433],
       [-0.07391256, -0.58473818,  0.26255968,  0.39609107]])

Some other general notes on your attempt:

for i in range(len(a)): If you want to loop over all rows in an array it's generally faster to do simply for row in a. NumPy can optimize this case somewhat, whereas if you do for idx in range(len(a)) then for each index you have to again index the array with a[idx] which is slower. But even then it's better not to use a for loop at all where possible, which you already know.
row = np.array(a[i]): The np.array() isn't necessary. If you index an multi-dimensional array the returned value is already an array.
lambda x: math.log(x/geo_mean): Don't use math functions with NumPy arrays. Use the equivalents in the numpy module. Wrapping this in a function adds unnecessary overhead as well. Since you use this like [flr(x) for x in row] that's just equivalent to the already vectorized NumPy operations: np.log(row / geo_mean).