Finding the average of the x component of an array of coordinates, based on the y component-CodePudding

I have the following example array of x-y coordinate pairs:

A = np.array([[0.33703753, 3.],
              [0.90115394, 5.],
              [0.91172016, 5.],
              [0.93230994, 3.],
              [0.08084283, 3.],
              [0.71531777, 2.],
              [0.07880787, 3.],
              [0.03501083, 4.],
              [0.69253184, 4.],
              [0.62214452, 3.],
              [0.26953094, 1.],
              [0.4617873 , 3.],
              [0.6495549 , 0.],
              [0.84531478, 4.],
              [0.08493308, 5.]])

My goal is to reduce this to an array with six rows by taking the average of the x-values for each y-value, like so:

array([[0.6495549 , 0.        ],
       [0.26953094, 1.        ],
       [0.71531777, 2.        ],
       [0.41882167, 3.        ],
       [0.52428582, 4.        ],
       [0.63260239, 5.        ]])

Currently I am achieving this by converting to a pandas dataframe, performing the calculation, and converting back to a numpy array:

>>> df = pd.DataFrame({'x':A[:, 0], 'y':A[:, 1]})
>>> df.groupby('y').mean().reset_index()
     y         x
0  0.0  0.649555
1  1.0  0.269531
2  2.0  0.715318
3  3.0  0.418822
4  4.0  0.524286
5  5.0  0.632602

Is there a way to perform this calculation using numpy, without having to resort to the pandas library?

CodePudding user response：

Here is a work around using numpy.

unique_ys, indices = np.unique(A[:, 1], return_inverse=True)
result = np.empty((unique_ys.shape[0], 2))

for i, y in enumerate(unique_ys):
    result[i, 0] = np.mean(A[indices == i, 0])
    result[i, 1] = y

print(result)

Alternative:
To make the code more pythonic, you can use a list comprehension to create the result array, instead of using a for loop.

unique_ys, indices = np.unique(A[:, 1], return_inverse=True)
result = np.array([[np.mean(A[indices == i, 0]), y] for i, y in enumerate(unique_ys)])

print(result)

Output:

[[0.6495549  0.        ]
 [0.26953094 1.        ]
 [0.71531777 2.        ]
 [0.41882167 3.        ]
 [0.52428582 4.        ]
 [0.63260239 5.        ]]

CodePudding user response：

If you know the y values beforehand, you could try to match the array for each:

for example:

A[(A[:,1]==1),0] will give you all the x values where the y value is equal to 1.

So you could go through each value of y, sum the A[:,1]==y[n] to get the number of matches, sum the x values that match, divide to make the average, and place in a new array:

B=np.zeros([6,2])

for i in range( 6):
    nmatch=sum(A[:,1]==i)
    nsum=sum(A[(A[:,1]==i),0])
    
    B[i,0]=i
    B[i,1]=nsum/nmatch

There must be a more pythonic way of doing this ....