For some reason when I want to sort this matrix by the 3rd column with argsort, it does not work when element (3,3) in my matrix is greater than ten.. If its 9 or less it seems to sort correctly though. Does anyone know whats wrong here?
X = np.array([[5, 2, 3], [2.222, 5.5, 6], [3.3, 8, 10], [1.05, 0, 0]])
y = np.array([['T'], ['F'], ['T'], ['T']])
data = np.column_stack([X,y])
print(data)
sorted_data = data[data[:, 2].argsort()]
print(sorted_data)
screenshot of resulting print out
CodePudding user response:
As already pointed out in the comments, argsort()
is working correctly, but once you stack X
with y
, then the full array dtype is no longer float but unicode string (see this post).
Modified code here:
import numpy as np
X = np.array([[5, 2, 3], [2.222, 5.5, 6], [3.3, 8, 10], [1.05, 0, 0]])
y = np.array([['T'], ['F'], ['T'], ['T']])
data = np.column_stack([X,y])
print(data)
sorted_data = data[data[:, 2].astype(float).argsort()]
print(sorted_data)
CodePudding user response:
The issue is that you mixed string values with numeric values, and (unlike Python lists) numpy arrays have a single dtype. I think if you were to print(data.dtype)
it would show a string-based type.
If you really want to combine the arrays, then I suggest that you sort it using the same index before combining. Something like this should work:
X = np.array([[5, 2, 3], [2.222, 5.5, 6], [3.3, 8, 10], [1.05, 0, 0]])
y = np.array([['T'], ['F'], ['T'], ['T']])
data = np.column_stack([X,y])
sort_indices = X[:,2].argsort()
X = X[sort_indices]
y = y[sort_indices]
sorted_data = np.column_stack([X,y])
print(data)
print(sorted_data)