Let's say we have a matrix, A, that has the following values:
In [2]: A
Out[2]:
array([[1, 1, 3],
[1, 1, 5],
[1, 1, 7],
[1, 2, 3],
[1, 2, 9],
[2, 1, 5],
[2, 2, 1],
[2, 2, 8],
[2, 2, 3]])
is there a way to apply a function, e.g., np.mean
, row-wise for values of the third column where first and the second column are equal, i.e, to get matrix B:
In [4]: B
Out[4]:
array([[1, 1, 5],
[1, 2, 6],
[2, 1, 5],
[2, 2, 4]])
My actual use case is much more complex. I have a large matrix with ~1M rows and 4 columns. The first three columns correspond to (x, y, z) coordinate of a point in a point-cloud and the forth column is a value of some function f where f = f(x, y, z). I have to perform integration along x-axis (the first column in a matrix) for all (y, z) pairs that are equal. I have to end up with a matrix with some number of rows which corresponds to the number of unique (y, z) pairs and three columns: y-axis, z-axis, and the value that is obtained from integration. I have a few ideas but all those ideas include multiple for-loops and potential memory issues.
Is there any way to perform this in a vectorized fashion?
CodePudding user response:
you can use pandas
, if you have a lot of data :
import pandas as pd
df = pd.DataFrame(A, columns = ['id1','id2' ,'value'])
B = df.groupby(['id1','id2'])['value'].mean().reset_index().to_numpy()
output:
>>
[[1. 1. 5.]
[1. 2. 6.]
[2. 1. 5.]
[2. 2. 4.]]
I assume this is being the fastest way
CodePudding user response:
A possible solution:
import numpy as np
A = np.array([[1, 1, 3],
[1, 1, 5],
[1, 1, 7],
[1, 2, 3],
[1, 2, 9],
[2, 1, 5],
[2, 2, 1],
[2, 2, 8],
[2, 2, 3]])
uniquePairs = np.unique(A[:,:2], axis=0)
output = np.ndarray((uniquePairs.shape[0], A.shape[1]))
for iPair, pair in enumerate(uniquePairs):
output[iPair,:2] = pair
output[iPair,2] = np.mean( A[np.logical_and(A[:,0]==pair[0], A[:,1]==pair[1]),2] )
print(output)
The output is
[[1. 1. 5.]
[1. 2. 6.]
[2. 1. 5.]
[2. 2. 4.]]
There is also a more compact variation, but perhaps with less readability:
uniquePairs = np.unique(A[:,:2], axis=0)
output = np.array([[*pair, np.mean(A[np.logical_and(A[:,0]==pair[0], A[:,1]==pair[1]),2])] for iPair, pair in enumerate(uniquePairs)])
CodePudding user response:
import numpy as np
A = np.array(
[
[1, 1, 3],
[1, 1, 5],
[1, 1, 7],
[1, 2, 3],
[1, 2, 9],
[2, 1, 5],
[2, 2, 1],
[2, 2, 8],
[2, 2, 3],
]
)
result = np.mean(A[:, 2], where=A[:, 0] == A[:, 1])
This might be what you're looking for. You can use A[:, n]
to access a column.