Home > front end >  Problem with "equality" in std::unordered_set
Problem with "equality" in std::unordered_set

Time:09-12

I see some behaviour in std::unordered_set that I do not understand.

I created a struct that also provides a "hash" function and an "equality" function. The struct contains values for an interval, the lower and upper bound. I use the struc in the std::unordered_set definition als as the "hash" and "equal_to" template argument.

I want to add only disjoint intervals to a std::unordered_set. So, I use a selfmade disjoint-function and consider 2 intervals to be equal, if they overlap.

I have written some test function below.

Why can I insert intervals [1,5] and [2,3] although they overlap and are considered to be equal? Where is my stupid mistake?

Please note: While debugging it, I noticed, that in most cases the comparison function is not even called. This I do not understand.

#include <iostream>
#include <fstream>
#include <vector>
#include <unordered_set>

struct Interval {
    int left{};
    int right{};
    bool isDisjoint(const Interval& other) const {
        return left < other.right && right < other.left;
    }
    // Hash function
    std::size_t operator()(const Interval& i) const {
        std::size_t result = static_cast<std::size_t>(i.left)   (static_cast<std::size_t>(i.right) << 16);
        return result;
    }
    // Comparison for equality
    bool operator ()(const Interval& lhs, const Interval& rhs) const  {
        bool result =  not lhs.isDisjoint(rhs);
        return  result;
    }
    //bool operator== (const Interval& other) const {
    //    bool result = not isDisjoint(other);
    //    return  result;
    //}

    // Simple output
    friend std::ostream& operator << (std::ostream& os, const Interval& i) {
        return os << '[' << i.left << ','<< i.right << ']';
    }
};

std::vector<Interval> intervals{ {1,5},{2,3},{1,10},{5,10},{6,8} };

int main() {

    std::unordered_set< Interval, Interval, Interval> disjointIntervals{};

    // Special test for first 2 intervals
    std::cout << ((not intervals[0].isDisjoint(intervals[1]))?"[1,5] is equal to [2,3]": "[1,5] is NOT equal to [2,3]") << '\n';

    for (const auto& i : intervals) {
        // Test for hash values. They are all different
        std::cout << "Interval: " << (static_cast<std::size_t>(i.left)   (static_cast<std::size_t>(i.right) << 16)) << " \tHash: " << (i.left   (i.right << 16)) << '\n';
        disjointIntervals.insert(i);
    }
    for (const auto& i : disjointIntervals)
        std::cout << i << '\n';
}

CodePudding user response:

std::unordered_set is a hash set. Elements will be organized into buckets, where the hash of an element identifies the bucket it's in. (That's how you get amortized O(1) insertion and lookup).

By the nature of hashing, different objects can produce the same hash. In this case and probably only in this case the comparison will be called to check wether two elements with equal hashes are indeed equal or not.

The two ranges [1,5] and [2,3] produce different hashes, thus will never be checked for equality.


It's not possible to produce identical hashes for overlapping ranges like you intend to. (unless the hash is always the same, which kind of defies the point of using a hash set).

I suggest to use different data structure. Maybe a sorted std::vector.

CodePudding user response:

The named requirement Hash states:

All evaluations of h(k) executed within a given execution of a program yield the same result for the same value of k.

Also for std::hash, the same in different words:

For two parameters k1 and k2 that are equal, std::hash()(k1) == std::hash()(k2).

Roughly speaking, elements with the same hash end up in the same bucket, and because different elements may end up to have the same hash, only then they have to be checked for equality.

Your hash function does not produce the same hash for keys that are equal and thats wrong.

  • Related