I have the following example array of x-y coordinate pairs:
A = np.array([[0.33703753, 3.],
[0.90115394, 5.],
[0.91172016, 5.],
[0.93230994, 3.],
[0.08084283, 3.],
[0.71531777, 2.],
[0.07880787, 3.],
[0.03501083, 4.],
[0.69253184, 4.],
[0.62214452, 3.],
[0.26953094, 1.],
[0.4617873 , 3.],
[0.6495549 , 0.],
[0.84531478, 4.],
[0.08493308, 5.]])
My goal is to reduce this to an array with six rows by taking the average of the x-values for each y-value, like so:
array([[0.6495549 , 0. ],
[0.26953094, 1. ],
[0.71531777, 2. ],
[0.41882167, 3. ],
[0.52428582, 4. ],
[0.63260239, 5. ]])
Currently I am achieving this by converting to a pandas dataframe, performing the calculation, and converting back to a numpy array:
>>> df = pd.DataFrame({'x':A[:, 0], 'y':A[:, 1]})
>>> df.groupby('y').mean().reset_index()
y x
0 0.0 0.649555
1 1.0 0.269531
2 2.0 0.715318
3 3.0 0.418822
4 4.0 0.524286
5 5.0 0.632602
Is there a way to perform this calculation using numpy, without having to resort to the pandas library?
CodePudding user response:
Here is a work around using numpy.
unique_ys, indices = np.unique(A[:, 1], return_inverse=True)
result = np.empty((unique_ys.shape[0], 2))
for i, y in enumerate(unique_ys):
result[i, 0] = np.mean(A[indices == i, 0])
result[i, 1] = y
print(result)
Alternative:
To make the code more pythonic, you can use a list comprehension to create the result
array, instead of using a for loop.
unique_ys, indices = np.unique(A[:, 1], return_inverse=True)
result = np.array([[np.mean(A[indices == i, 0]), y] for i, y in enumerate(unique_ys)])
print(result)
Output:
[[0.6495549 0. ]
[0.26953094 1. ]
[0.71531777 2. ]
[0.41882167 3. ]
[0.52428582 4. ]
[0.63260239 5. ]]
CodePudding user response:
If you know the y values beforehand, you could try to match the array for each:
for example:
A[(A[:,1]==1),0]
will give you all the x values where the y value is equal to 1.
So you could go through each value of y, sum the A[:,1]==y[n]
to get the number of matches, sum the x values that match, divide to make the average, and place in a new array:
B=np.zeros([6,2])
for i in range( 6):
nmatch=sum(A[:,1]==i)
nsum=sum(A[(A[:,1]==i),0])
B[i,0]=i
B[i,1]=nsum/nmatch
There must be a more pythonic way of doing this ....