Home > front end >  Why does unordered_set maintain insertion order here
Why does unordered_set maintain insertion order here

Time:12-22

I know unordered_set doesn't guarantee to maintain order. But I was wondering if simple ascending sequence say N natural numbers was inserted to a set, will it maintain the order. It did..

unordered_set<int> nums;
for (int i=10; i>0; --i) {
    nums.insert(i);
}
nums.erase(4);
for (auto i : nums) {
    cout<< i<< endl;
}

This prints:

1
2
3
5
6
7
8
9
10

I ran this several times with consistent result.

  • Why did it maintaint reverse order of insertion?
  • Is there an solid reason for this behaviour?(If this works, it can make my code super efficient ;) )

CodePudding user response:

if simple ascending sequence say N natural numbers was inserted to a set, will it maintain the order. It did...

But it didn't, it reversed the order. Like you say, in fact:

Why did it maintain reverse order of insertion?

This is just an implementation detail. How any implementation of std::unordered_set stores its elements is not specified by the standard, except for the fact that it has buckets. Another implementation could very well store these 10 integers in order of insertion (not in the reverse order), or really any order at all.

Is there an solid reason for this behaviour?

Kind of. I will take GCC's implementation as an example.

std::unordered_set stores elements in buckets. When inserting a first element, it allocates enough space for a small number, say 13. This space is divided into 13 buckets, and each correspond to the hash of the integers 0 through 12 (the hash being the integer itself). That way, inserting the next few elements, which are all integers between 0 and 12, does not cause any rehash or collision and each element takes a bucket. Again, the reason they end up in the reverse order is an implementation detail and not relevant to this part.

However, if you insert more than 13 elements, the set needs to reallocate memory and move the elements, and at that point their order can change. In the case of GCC's implementation, it turns out the elements end up in the insertion order after the move, and so inserting the integers from 0 to 13 gives this sequence: 13 0 1 2 3 4 5 6 7 8 9 10 11 12.

You can look at the buckets yourself:

#include <unordered_set>
#include <iostream>

int main() {
    std::unordered_set<int> s;
    auto print_s = [&s](){
        std::cout << "s = [ ";
        for (auto i : s) {
            std::cout << i << " ";
        }
        std::cout << "]\n";
    };

    auto print_bucket = [&s](int bucket){
        for (auto it = s.begin(bucket); it != s.end(bucket);   it) {
            std::cout << *it << " ";
        }
        std::cout << "\n";
    };
    
    for (int i = 0; i < 20;   i) {
        std::cout << "i=" << i << "\n";
        s.insert(i);
        std::cout << "bucket_count=" << s.bucket_count() << "\n";
        for (auto b = 0; b < s.bucket_count();   b) {
            if (s.bucket_size(b) != 0) {
                std::cout << "\tb=" << b << ": ";
                print_bucket(b);
            } else {
                std::cout << "\tb=" << b << ": empty\n";
            }
        }
        print_s();
    }
}

Demo
Clang (with libc - thanks Miles Budnek) and MSVC do something completely different from GCC (libstdc ).

CodePudding user response:

The container keep reverse insertion order is probably a side effect.

The primary reason would be to avoid iterate through all empty bucket which may be inefficient, by implement it, the insertion order is somewhat preserved.

From what it appears, seems like it simply keep the last bucket pointer/id and link to it when new bucket is used (and result in a reverse-linked buckets).

  • Related