np.argsort not sorting correctly when value over a certain threshold-CodePudding

For some reason when I want to sort this matrix by the 3rd column with argsort, it does not work when element (3,3) in my matrix is greater than ten.. If its 9 or less it seems to sort correctly though. Does anyone know whats wrong here?

X = np.array([[5, 2, 3], [2.222, 5.5, 6], [3.3, 8, 10], [1.05, 0, 0]])
y = np.array([['T'], ['F'], ['T'], ['T']])
data = np.column_stack([X,y])
print(data)
sorted_data = data[data[:, 2].argsort()]
print(sorted_data)

screenshot of resulting print out

CodePudding user response：

As already pointed out in the comments, argsort() is working correctly, but once you stack X with y, then the full array dtype is no longer float but unicode string (see this post).

Modified code here:

import numpy as np

X = np.array([[5, 2, 3], [2.222, 5.5, 6], [3.3, 8, 10], [1.05, 0, 0]])
y = np.array([['T'], ['F'], ['T'], ['T']])
data = np.column_stack([X,y])
print(data)
sorted_data = data[data[:, 2].astype(float).argsort()]
print(sorted_data)

CodePudding user response：

The issue is that you mixed string values with numeric values, and (unlike Python lists) numpy arrays have a single dtype. I think if you were to print(data.dtype) it would show a string-based type.

If you really want to combine the arrays, then I suggest that you sort it using the same index before combining. Something like this should work:

X = np.array([[5, 2, 3], [2.222, 5.5, 6], [3.3, 8, 10], [1.05, 0, 0]])
y = np.array([['T'], ['F'], ['T'], ['T']])

data = np.column_stack([X,y])

sort_indices = X[:,2].argsort()
X = X[sort_indices]
y = y[sort_indices]

sorted_data = np.column_stack([X,y])
print(data)
print(sorted_data)