Home > other >  How to sort a numpy array that contains floats and strings in numeric order?
How to sort a numpy array that contains floats and strings in numeric order?

Time:01-05

I've got a numpy array that contains some numbers and strings in separate columns:

a = np.array( [[ 3e-05, 'A' ],
[ 2, 'B' ],
[ 1e-05, 'C' ]]
)

print(a[a[:, 0].argsort()])

However, when try to sort it based on the first column using .argsort() it's sorted in string order not numeric order.

[['1e-05' 'C']
 ['2' 'B']
 ['3e-05' 'A']]

How do I go about getting the array to sort in numeric order based on the first column?

CodePudding user response:

In this case, a is an array of strings, as evidenced by a.dtype being '<U32'. Therefore, a[:, 0].argsort() will sort the column in lexical order.

To sort a column as numbers, it needs to be converted to numbers first, by calling .astype before .argsort:

a = np.array( [[ 3e-05, 'A' ],
[ 2, 'B' ],
[ 1e-05, 'C' ]]
)

print(a[a[:, 0].astype(float).argsort()])

Output:

[['1e-05' 'C']
 ['3e-05' 'A']
 ['2' 'B']]

CodePudding user response:

If you have control over the creation of the array, you could create a structured array instead of a regular array.

dtypes = [('value', np.float64), ('label', '<U32')]

a = np.array( [( 3e-05, 'A' ),
               ( 2, 'B' ),
               ( 1e-05, 'C' )], dtype=dtypes)

Now, a is a structured array with separate dtypes for the first and second columns -- the first column is an array of floats, and the second column is an array of strings.

Note that the array is defined as a list of tuples. This is important: defining it as a list of lists and then specifying dtype=dtypes won't work.

Now, you can sort by a column like so:

a_sorted = np.sort(a, order=['value'])

which gives:

array([(1.e-05, 'C'), (3.e-05, 'A'), (2.e 00, 'B')],
      dtype=[('value', '<f8'), ('label', '<U32')])

You can get a row or column of this structured array like so:

>>> a_sorted[0]
(1.e-05, 'C')

>>> a_sorted['value']
array([1.e-05, 3.e-05, 2.e 00])
  • Related