Return 4 row of np array where the values are the biggest in column 1-CodePudding

I have the following array MyArray :

[['AZ' 0.144]
 ['RZ' 14.021]
 ['BH' 1003.487]
 ['NE' 1191.514]
 ['FG' 550.991]
 ['MA' nan]]

Where Array dim is :

MyArray.shape
(6,2)

How would I return the 4 Row where values are the biggest ?

So the output would be :

[['RZ' 14.021]
 ['BH' 1003.487]
 ['NE' 1191.514]
 ['FG' 550.991]]

I tried :

MyArray[np.argpartition(MyArray, -2)][:-4]

But this does return an error :

TypeError: '<' not supported between instances of 'float' and 'str'

What am I doing wrong ?

CodePudding user response：

You just sort by second column and get last 4 rows:

import numpy as np
a = np.array(
    [['AZ', 0.144],
     ['RZ', 14.021],
     ['BH', 1003.487],
     ['NE', 1191.514],
     ['FG', 550.991],
     ['MA', np.nan]],
)

a = a[~np.isnan(a[:, 1].astype(float))]
srt = a[a[:, 1].astype(float).argsort()]
print(srt[-4:, :])

CodePudding user response：

Lets start with a remark on how to create MyArray: You have to pass dtype=object, otherwise the array is of <U8 type.

Start the computation with setting the number of rows to retrieve:

n = 4

Then get the result running:

result = MyArray[np.argpartition(MyArray[:, 1], n)[:n]]

The result is:

array([['AZ', 0.144],
       ['RZ', 14.021],
       ['FG', 550.991],
       ['BH', 1003.487]], dtype=object)

How this code works:

np.argpartition(MyArray[:, 1], n) retrieves array([0, 1, 4, 2, 3, 5], dtype=int64). First 4 elements are indices of rows with 4 lowest values in column 1.
…[:n] - leaves only the indices of the lowest rows.
MyArray[…] - retrieves the indicated rows.

Other possible solution, maybe easier to comprehend:

result = np.take(MyArray, np.argpartition(MyArray[:, 1], n)[:n], axis=0)