Home > Enterprise >  Python - argsort sorting incorrectly
Python - argsort sorting incorrectly

Time:07-06

What is the problem? Where am I doing wrong?

I am new to Python and I could not find the problem. Thanks a lot in advance for your help.

The code is

import numpy as np
users = [["Richard", 18],["Sophia", 16],["Kelly", 3],["Anna", 15],["Nicholas", 17],["Lisa", 2]]
users = np.array(users)
print(users[users[:, 1].argsort()])

Output should be

[['Lisa' '2']
['Kelly' '3']
['Anna' '15']
['Sophia' '16']
['Nicholas' '17']
['Richard' '18']]

But output is

[['Anna' '15']
['Sophia' '16']
['Nicholas' '17']
['Richard' '18']
['Lisa' '2']
['Kelly' '3']]

CodePudding user response:

The numbers are being interpreted as strings (so '15' comes before '2', like 'ae' comes before 'b'). The fact that in the output, you see things like '15' with single quotes around it, is a clue to this.

In order to create a numpy array which has a mixture of data types (strings for the names, ints for the numbers), you can create the array this way, specifying the data type as object:

users = np.array(users, dtype=object)

That will give the output you're looking for.

CodePudding user response:

Try this if you don't want to change the dtype:

print(users[users[:, 1].astype(float).argsort()])

It should give you the result you are looking for. The answer given by slothrop is enough as an explanation

CodePudding user response:

When converting it to a np.array the integers are converted to strings and strings get sorted differently than numbers.

You could sort the lists first and then convert it to an array (if you really want that).

users = [["Richard", 18],["Sophia", 16],["Kelly", 3],["Anna", 15],["Nicholas", 17],["Lisa", 2]]
users_sorted = sorted(users, key=lambda x: x[1])
print(users_sorted)

[['Lisa', 2], ['Kelly', 3], ['Anna', 15], ['Sophia', 16], ['Nicholas', 17], ['Richard', 18]]

CodePudding user response:

If there is a possibility of multiple people having the same number you probably want to use np.lexsort instead of np.argsort, to first sort on the number and then by name:

import numpy as np

users = [["Richard", 18], ["Sophia", 16], ["Kelly", 3], ["Bob", 15], ["Anna", 15], ["Nicholas", 17], ["Lisa", 2]]
users = np.array(users, dtype=object)
sorted_users = users[np.lexsort((users[:,0], users[:,1]))]
print(sorted_users)

Output:

[['Lisa' 2]
 ['Kelly' 3]
 ['Anna' 15]
 ['Bob' 15]
 ['Sophia' 16]
 ['Nicholas' 17]
 ['Richard' 18]]

The equivalent without using numpy would be like this:

users = [["Richard", 18], ["Sophia", 16], ["Kelly", 3], ["Anna", 15], ["Nicholas", 17], ["Lisa", 2]]
sorted_users = sorted(users, key=lambda user: (user[1], user[0]))
print(sorted_users)

Output:

[['Lisa', 2], ['Kelly', 3], ['Anna', 15], ['Bob', 15], ['Sophia', 16], ['Nicholas', 17], ['Richard', 18]]
  • Related