Home > Net >  Why is "insert" of unordered_set Undefine Behavior here?
Why is "insert" of unordered_set Undefine Behavior here?

Time:12-13

I am sorry for my fuzzy title.

Suppose there are some pointers of nodes, and I want to collect the nodes' pointers with unique value.

struct node_t
{
    int value;
    node_t(int v = -1) : value(v) {}
};

For example, if we have 4 pointers:

p1 points to node(1)
p2 points to node(1)
p3 points to node(2)
p4 points to node(2)

then I want to collect {p1, p3} here.

And this is what my code wrote:

#include <iostream>
#include <unordered_set>
#include <algorithm>
using namespace std;
struct node_t
{
    int value;
    node_t(int v = -1) : value(v) {}
};
struct myequal
{
    bool operator()(const node_t *p1, const node_t *p2) const
    {
        return p1->value == p2->value;
    }
};
int main()
{
    unordered_set<node_t *, hash<node_t *>, myequal> table;
    node_t n1(0), n2(0);
    table.insert(&n1), table.insert(&n2);
    cout << (&n1) << '\n';
    cout << (&n2) << '\n';
    cout << table.size() << '\n';
    cout << *table.begin() << '\n';
}

I run the code on MacOS12, compile it with clang -std=c 17 xxx.cpp, but its output is unsure.

Sometimes it outputs:

0x7ff7bad974e0
0x7ff7bad974d0
1
0x7ff7bad974e0

But sometimes it outputs:

0x7ff7b4bdc4e0
0x7ff7b4bdc4d0
2
0x7ff7b4bdc4d0

Why do this happen?

According to the document of unordered_set,

Each element is inserted only if it is not equivalent to any other element already in the container (elements in an unordered_set have unique values).

CodePudding user response:

The short version is that your hash and equality operations are incompatible. When you insert an element, first the hash is taken, then the bucket for that hash is checked to see if an equivalent element is present.

Let's say there are three buckets named A, B, and C. You insert n1 and it ends up in, let's say, bucket B. Next you insert n2.

  • If n2 gets hashed to bucket B, then the equivalent n1 is found in that bucket, so n2 is not inserted.
  • If n2 gets hashed to bucket A (or C), then that bucket – and only that bucket – is checked to see if the element already exists. Bucket A is empty, so of course no equivalent element is found. So n2 is inserted.

To make your hash and equality operations compatible, equal things must compute to the same hash. That ensures that equal things will be assigned to the same bucket (ensures that n2 gets hashed to bucket B in the above example). If equality is based upon p1->value, then the hash better be based upon p1->value.

Documentation from cppreference.com:

If two Keys are equal according to Pred, Hash must return the same value for both keys.

  • Related