Home > database >  why duplicates removal in list using a set method gives output with different index each time?
why duplicates removal in list using a set method gives output with different index each time?

Time:08-11

I know to remove duplicates in a list...just curious to know why set does not give order as orginal list

my_list = ['apple', 'mango', 'grape', 'apple', 'guava', 'pumpkin']
>>>[*set(my_list)]

#output:
>>> ['mango', 'apple', 'grape', 'guava', 'pumpkin']
>>> ['pumpkin', 'guava', 'grape', 'mango', 'apple']

CodePudding user response:

As all the comments say, a set is unordered, always.

But internally it uses a hash table, and IIRC the values stored are the hash of the object modulo the table size. Now small integers tend to have themselves as their hash values, so you may have the impression that they are sorted (not ordered by insertion order), but this won't always be the case:

ls = [1,2,3]
[*set(ls)]
[1, 2, 3]

ls = [2,1,3]
[*set(ls)]
[1, 2, 3]

ls2=[-1,-2,3]
[*set(ls2)]
[3, -1, -2]

ls2=[-2,-1,3]
[*set(ls2)]
[3, -2, -1]

Other objects, like the strings in your example, have very different hash values, so the behaviour is totally different:

hash('mango')
-7062263298897675226
  • Related