Is there a way to apply a function over all rows with the same values in a NumPy array?-CodePudding

Let's say we have a matrix, A, that has the following values:

In [2]: A
Out[2]: 
array([[1, 1, 3],
       [1, 1, 5],
       [1, 1, 7],
       [1, 2, 3],
       [1, 2, 9],
       [2, 1, 5],
       [2, 2, 1],
       [2, 2, 8],
       [2, 2, 3]])

is there a way to apply a function, e.g., np.mean, row-wise for values of the third column where first and the second column are equal, i.e, to get matrix B:

In [4]: B
Out[4]: 
array([[1, 1, 5],
       [1, 2, 6],
       [2, 1, 5],
       [2, 2, 4]])

My actual use case is much more complex. I have a large matrix with ~1M rows and 4 columns. The first three columns correspond to (x, y, z) coordinate of a point in a point-cloud and the forth column is a value of some function f where f = f(x, y, z). I have to perform integration along x-axis (the first column in a matrix) for all (y, z) pairs that are equal. I have to end up with a matrix with some number of rows which corresponds to the number of unique (y, z) pairs and three columns: y-axis, z-axis, and the value that is obtained from integration. I have a few ideas but all those ideas include multiple for-loops and potential memory issues.

Is there any way to perform this in a vectorized fashion?

CodePudding user response：

you can use pandas, if you have a lot of data :

import pandas as pd
df = pd.DataFrame(A, columns = ['id1','id2' ,'value'])
B = df.groupby(['id1','id2'])['value'].mean().reset_index().to_numpy()

output:

>>
[[1. 1. 5.]
 [1. 2. 6.]
 [2. 1. 5.]
 [2. 2. 4.]]

I assume this is being the fastest way

CodePudding user response：

A possible solution:

import numpy as np

A = np.array([[1, 1, 3],
       [1, 1, 5],
       [1, 1, 7],
       [1, 2, 3],
       [1, 2, 9],
       [2, 1, 5],
       [2, 2, 1],
       [2, 2, 8],
       [2, 2, 3]]) 

uniquePairs = np.unique(A[:,:2], axis=0)
output = np.ndarray((uniquePairs.shape[0], A.shape[1]))
for iPair, pair in enumerate(uniquePairs):
    output[iPair,:2] = pair
    output[iPair,2] = np.mean( A[np.logical_and(A[:,0]==pair[0], A[:,1]==pair[1]),2] )
    
print(output)

The output is

[[1. 1. 5.]
 [1. 2. 6.]
 [2. 1. 5.]
 [2. 2. 4.]]

There is also a more compact variation, but perhaps with less readability:

uniquePairs = np.unique(A[:,:2], axis=0)
output = np.array([[*pair,  np.mean(A[np.logical_and(A[:,0]==pair[0], A[:,1]==pair[1]),2])] for iPair, pair in enumerate(uniquePairs)])

CodePudding user response：

import numpy as np

A = np.array(
    [
        [1, 1, 3],
        [1, 1, 5],
        [1, 1, 7],
        [1, 2, 3],
        [1, 2, 9],
        [2, 1, 5],
        [2, 2, 1],
        [2, 2, 8],
        [2, 2, 3],
    ]
)

result = np.mean(A[:, 2], where=A[:, 0] == A[:, 1])

This might be what you're looking for. You can use A[:, n] to access a column.