Home > Back-end >  Filtering an np array using values from each row
Filtering an np array using values from each row

Time:02-13

I have an array with the shape (29,2):

arr = [[ 405.95168576 1033.        ]
 [ 406.23572583 1033.        ]
 [ 407.49812423 1028.        ]
 [ 402.66145728 1029.        ]
 [ 404.11080846 1032.        ]
 [ 401.75897118 1033.        ]
 [ 402.29352509 1029.        ]
 [ 402.34504752 1024.        ]
 [ 402.69938672 1027.        ]
 [ 400.55298544 1029.        ]
 [ 401.41432112 1027.        ]
 [ 400.89318038 1027.        ]
 [ 401.07444532 1029.        ]
 [ 400.43212193 1033.        ]
 [ 400.38178995 1027.        ]
 [ 399.89895625 1025.        ]
 [ 399.88394127 1031.        ]
 [ 399.97766298 1021.        ]
 [ 399.68084993 1027.        ]
 [ 399.65810987 1029.        ]
 [ 399.40565484 1020.        ]
 [ 399.34339145 1023.        ]
 [ 399.39613518 1019.        ]
 [ 399.37733697 1020.        ]
 [ 399.38314402 1020.        ]
 [ 399.47479381 1025.        ]
 [ 399.44134998 1025.        ]
 [ 399.43511907 1020.        ]
 [ 399.40346787 1020.        ]]

I would like to filter this array to find whether each row contains the maximum value for column arr[:,0], of all rows in which the value of arr[:,1] is equal to or lower than the one contained in that row.

At the moment, I have the following code, which produces the correct result:

import numpy as np

res = np.array([])

for i in range(arr.shape[0]):
    print(np.max(arr[:,0][arr[:,1] <= arr[i][1]]))
    if arr[i][0] >= np.max(arr[:,0][arr[:,1] <= arr[i][1]]):
        res = np.hstack((res, True))
    else:
        res = np.hstack((res, False))

print(res)

Is there a way to perform this operation in pure numpy, i.e. without using the loop?

CodePudding user response:

The following approach:

  • uses np.lexsort to order the array first by the second column ascending, then by the first column descending
  • uses np.maximum.accumulate to calculate the accumulated maxima
  • reverses the sorted order back to the original order to be able to compare
import numpy as np
arr = np.array([[405.95168576, 1033], [406.23572583, 1033], [407.49812423, 1028], [402.66145728, 1029], [404.11080846, 1032], [401.75897118, 1033], [402.29352509, 1029], [402.34504752, 1024], [402.69938672, 1027], [400.55298544, 1029], [401.41432112, 1027], [400.89318038, 1027], [401.07444532, 1029], [400.43212193, 1033], [400.38178995, 1027], [399.89895625, 1025], [399.88394127, 1031], [399.97766298, 1021], [399.68084993, 1027], [399.65810987, 1029], [399.40565484, 1020], [399.34339145, 1023], [399.39613518, 1019], [399.37733697, 1020], [399.38314402, 1020], [399.47479381, 1025], [399.44134998, 1025], [399.43511907, 1020], [399.40346787, 1020]])

# sort on arr[:,1] ascending then on arr[:,0] descending, return the indices
ind_sorted = np.lexsort((-arr[:, 0], arr[:, 1]))
# calculate the accumulated maxima of the sorted list
max_per_level_sorted = np.maximum.accumulate(arr[ind_sorted, 0])
# get the ordering that maps the sorted values back to the originals
reverse_sorted = np.argsort(ind_sorted)
# get the maxima in the order of the original array
max_per_level = max_per_level_sorted[reverse_sorted]
res = arr[:, 0] >= max_per_level
print(res.astype(int))  # res is a boolean array, show it as integers

[0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0]

If you really want to, you can compress all this together:

ind_sorted = np.lexsort((-arr[:, 0], arr[:, 1]))
res = arr[:, 0] >= np.maximum.accumulate(arr[ind_sorted, 0])[np.argsort(ind_sorted)]
print(res.astype(int))

Here is a visualization:

import matplotlib.pyplot as plt

plt.scatter(arr[:, 1], arr[:, 0], color='dodgerblue')
plt.scatter(arr[res, 1], arr[res, 0], fc='none', ec='crimson', lw=2, s=100, marker='H')

visualizing the selected points

  • Related