Home > database >  Why does Apple Clang make a call to compare for a unique hash in an unordered map?
Why does Apple Clang make a call to compare for a unique hash in an unordered map?

Time:04-26

I was trying to improve my understanding of the implementation of unordered_map and was surprised by this behavior. Consider this minimal example below.

#include <iostream>
#include <unordered_map>

using namespace std;

template<>
struct std::hash<int*>
{
    size_t operator()(int* arr) const
    {
        cout << "custom hash called" << endl;
        return arr[0];
    }
    
};


template <>
struct std::equal_to<int*>
{
    bool operator()(const int* lhs, const int* rhs) const
    {
        std::cout << "call to compare" << std::endl;
        return lhs == rhs;
    }
};

int main(int argc, char *argv[]) 
{   
    int arr1[8] {11,12,13,14,15,16,17,18};
    int arr2[8] {1,2,3,4,5,6,7,8};
    
    unordered_map<int*, string> myMap;
    myMap.insert(make_pair(arr1, "one"));
    myMap.insert({arr2, "two"});
}

I would have expected this output:

custom hash called
custom hash called

The hash for both inserts is unique and therefore no comparison of multiple keys should be required as I understand it (since the bucket should only contain exactly one key). And indeed this is the result when I try it with Clang, GCC and MSVC on godbolt.org. However, when I compile and run this example on a local Mac an additional call to the equal_to call operator happens for the second insert:

custom hash called
custom hash called
call to compare

Tested with

Apple clang version 13.1.6 (clang-1316.0.21.2)
Target: arm64-apple-darwin21.4.0
Thread model: posix

and

Apple clang version 13.1.6 (clang-1316.0.21.2.3)
Target: x86_64-apple-darwin21.4.0
Thread model: posix

In all cases only the C 20 flag was used.

CodePudding user response:

There are basically two cases where the comparator does not need to be applied:

  1. The first one is when the target bucket is empty (then, there is nothing to compare with). A simple demo code that works with both libstdc and libc is as follows:
struct Hash {
  size_t operator()(int a) const { return a; }
};

struct Equal { ...  /* log operator call */ };

std::unordered_map<int, int, Hash, Equal> m; 
m.reserve(2);

std::cout << m.bucket(0) << std::endl;  // 0
std::cout << m.bucket(1) << std::endl;  // 1

m.insert({0, 0});
m.insert({1, 0})

Here, both keys 0 and 1 target different buckets, so there is no comparison with both implementations.

Live demo: https://godbolt.org/z/5jfYv6sba

  1. The second case is when all the keys in the target bucket have different hashes and those hashes are stored (cached) in the hash table nodes. This caching is supported by libstdc and seems to be applied by default. However, it does not seem to be supported by libc . Exemplary code:
std::unordered_map<int, int, Hash, Equal> m; 
m.reserve(2);

std::cout << m.bucket(0) << std::endl;  // 0
std::cout << m.bucket(2) << std::endl;  // 0

m.insert({0, 0});
m.insert({2, 0})

Here, both keys target the same bucket (with index 0). With libstdc , since the hashes are cached and are different, they are compared and there is no reason to additionally compare the entire keys. However, with libc , hashes are not cached and the keys need to be compared.

Live demo: https://godbolt.org/z/vWK4Ko7Yj

  • Related