Home > Enterprise >  Why is Python's set() sorting a list in some cases?
Why is Python's set() sorting a list in some cases?

Time:01-03

I'm confused by the behavior of Python's set() in this example:

random_number_list = [randint(1, 10) for _ in range(10)]
# This will be sorted!
unique_numbers = set(random_number_list)

print(
    f"random_number_list/unique_numbers with same upper bound for randint() and range():\n{random_number_list=}\n{unique_numbers=}\n"
)

random_number_list = [randint(1, 100) for _ in range(10)]
# This will not be sorted.
unique_numbers = set(random_number_list)

print(
    f"random_number_list/unique_numbers with different upper bound for randint() and range():\n{random_number_list=}\n{unique_numbers=}\n"
)

It seems like set() is sorting the random_number_list if the length of the list and the upper bound of randint() are the same:

➜  ch-2 python --version
Python 3.10.0
➜  ch-2 python find_k_smallest.py 
random_number_list/unique_numbers with same upper bound for randint() and range():
random_number_list=[10, 1, 2, 5, 5, 7, 8, 8, 2, 8]
unique_numbers={1, 2, 5, 7, 8, 10}

random_number_list/unique_numbers with different upper bound for randint() and range():
random_number_list=[35, 1, 17, 26, 17, 74, 26, 11, 44, 13]
unique_numbers={1, 35, 74, 11, 44, 13, 17, 26}

The docs state:

A set object is an unordered collection of distinct hashable objects.

What's going on here? Why is set() sorting the random_number_list in certain cases and not others?

Edit Neither of these questions addresses my issue:

CodePudding user response:

To actually answer your question. Many implementations of sets use an implementation similar to hashtables. Items are hashed and placed into an "array" based on that hash value.

Notice that for small integers, hash(x) == x. So 1 will go into slot 1, 2 will go into slot 2, 3 into slot 3, etc. Then when the elements are read, you'll literally get the elements sorted.

However if your integers are bigger than the array size, then their placement in the array will be modulo the size of the array. They will no longer be sorted.

Again, I have not actually looked at the Python implementation. This is simply a possible explanation of what could be happening.

CodePudding user response:

"Unordered" does not mean "not sorted". It means no attempt is made to provide any particular order; the order that falls out from the implementation may or may not be a sorted order.

CodePudding user response:

You wrote in a comment:

I'm curious as to why set() is ordering its members sometimes when the size of the list is related to the bounds of randint().

That is an implementation detail that an application should not concern itself with, as even in Python 3.7 (and 3.10), sets are documented as "unordered collection[s]". You can look up, for example, the source code of CPython to find out how sets are implemented in CPython.

See also:

  • Related