I would like to calculate the log-ratios for my 2D array, e.g.
a = np.array([[3,2,1,4], [2,1,1,6], [1,5,9,1], [7,8,2,2], [5,3,7,8]])
The formula is ln(x/g(x)), where g(x) is the geometric mean of each row. I execute it like this:
logvalues = np.array(a) # the values will be overwritten through the code below.
for i in range(len(a)):
row = np.array(a[i])
geo_mean = row.prod()**(1.0/len(row))
flr = lambda x: math.log(x/geo_mean)
logvalues = np.array([flr(x) for x in row])
I was wondering if there is any way to vectorise the above lines (preferably without introducing other modules) to make it more efficient?
CodePudding user response:
This should do the trick:
geo_means = a.prod(1)**(1/a.shape[1])
logvalues = np.log(a/geo_means[:, None])
CodePudding user response:
Another way you could do this is just write the function as though for a single 1-D array, ignoring the 2-D aspect:
def f(x):
return np.log(x / x.prod()**(1.0 / len(x)))
Then if you want to apply it to all rows in a 2-D array (or N-D array):
>>> np.apply_along_axis(f, 1, a)
array([[ 0.30409883, -0.10136628, -0.79451346, 0.5917809 ],
[ 0.07192052, -0.62122666, -0.62122666, 1.17053281],
[-0.95166562, 0.65777229, 1.24555895, -0.95166562],
[ 0.59299864, 0.72653003, -0.65976433, -0.65976433],
[-0.07391256, -0.58473818, 0.26255968, 0.39609107]])
Some other general notes on your attempt:
for i in range(len(a))
: If you want to loop over all rows in an array it's generally faster to do simplyfor row in a
. NumPy can optimize this case somewhat, whereas if you dofor idx in range(len(a))
then for each index you have to again index the array witha[idx]
which is slower. But even then it's better not to use afor
loop at all where possible, which you already know.row = np.array(a[i])
: Thenp.array()
isn't necessary. If you index an multi-dimensional array the returned value is already an array.lambda x: math.log(x/geo_mean)
: Don't usemath
functions with NumPy arrays. Use the equivalents in thenumpy
module. Wrapping this in a function adds unnecessary overhead as well. Since you use this like[flr(x) for x in row]
that's just equivalent to the already vectorized NumPy operations:np.log(row / geo_mean)
.