I've got a numpy array that contains some numbers and strings in separate columns:
a = np.array( [[ 3e-05, 'A' ],
[ 2, 'B' ],
[ 1e-05, 'C' ]]
)
print(a[a[:, 0].argsort()])
However, when try to sort it based on the first column using .argsort()
it's sorted in string order not numeric order.
[['1e-05' 'C']
['2' 'B']
['3e-05' 'A']]
How do I go about getting the array to sort in numeric order based on the first column?
CodePudding user response:
In this case, a
is an array of strings, as evidenced by a.dtype
being '<U32'
. Therefore, a[:, 0].argsort()
will sort the column in lexical order.
To sort a column as numbers, it needs to be converted to numbers first, by calling .astype
before .argsort
:
a = np.array( [[ 3e-05, 'A' ],
[ 2, 'B' ],
[ 1e-05, 'C' ]]
)
print(a[a[:, 0].astype(float).argsort()])
Output:
[['1e-05' 'C']
['3e-05' 'A']
['2' 'B']]
CodePudding user response:
If you have control over the creation of the array, you could create a structured array instead of a regular array.
dtypes = [('value', np.float64), ('label', '<U32')]
a = np.array( [( 3e-05, 'A' ),
( 2, 'B' ),
( 1e-05, 'C' )], dtype=dtypes)
Now, a
is a structured array with separate dtypes for the first and second columns -- the first column is an array of floats, and the second column is an array of strings.
Note that the array is defined as a list of tuples. This is important: defining it as a list of lists and then specifying dtype=dtypes
won't work.
Now, you can sort by a column like so:
a_sorted = np.sort(a, order=['value'])
which gives:
array([(1.e-05, 'C'), (3.e-05, 'A'), (2.e 00, 'B')],
dtype=[('value', '<f8'), ('label', '<U32')])
You can get a row or column of this structured array like so:
>>> a_sorted[0]
(1.e-05, 'C')
>>> a_sorted['value']
array([1.e-05, 3.e-05, 2.e 00])