Suppose I have a Numpy array of a bunch of coordinates [x, y].
I want to filter this array.
For all coordinates in the array with a same x-value, I want to keep only one coordinate:
The coordinate with the maximum for the y.
What is the most efficient or Pythonic way to do this.
I will explain with an example below.
coord_arr= array([[10,5], [11,6], [12,6], [10,1], [11,0],[12,2]])
[10, 5] and [10,1] have the same x-value: x=10
maximum for y-values:
max(5,1) = 5
So I only keep coordinate [10,5]
Same procedure for x=11 and x=12
So I finally end up with:
filtered_coord_arr= array([[10,5],[11,6],[12,6]])
I have a solution by converting to a list and using list comprehension (see below).
But I am looking for a more efficient and elegant solution.
(The actual arrays are much larger than in this example.)
My solution:
coord_list = coord_arr.tolist()
x_set = set([coord[0] for coord in coord_list])
coord_max_y_list= []
for x in x_set:
compare_list=[coord for coord in coord_list if coord[0]==x]
coord_max = compare_list[compare_list.index(max([coord[1] for coord[1] in compare_list]))]
coord_max_y_list.append(coord_max)
filtered_coord_arr= np.array(coord_max_y_list)
CodePudding user response:
you can refer below answer :
Solution :
coord_arr= np.array([[10, 5], [11, 6], [12, 6], [13,7], [10,1], [10,7],[12,2], [13,0]])
df = pd.DataFrame(coord_arr,columns=['a','b'])
df = df.groupby(['a']).agg({'b': ['max']})
df.columns = ['b']
df = df.reset_index()
filtered_coord_arr = np.array(df)
filtered_coord_arr
Output :
array([[10, 7],
[11, 6],
[12, 6],
[13, 7]], dtype=int64)
CodePudding user response:
if your array in small you can just do it one line:
np.array([[x, max(coord[coord[:,0] == x][:,1])] for x in set(coord[:,0])])
however that is not correct complexity, if array is big and you care about correct complexity , do like this:
d = {}
for x, y in coord:
d[x] = max(d.get(x, float('-Inf')), y)
np.array([[x, y] for x,y in d.items()])