When I call reserve(n)
on a libc 's empty unordered_set, libc find the next prime number as bucket_count
if n
is not a power of two, otherwise they just use n
. This also makes reserve(15)
have a bigger bucket_count than reserve(16)
.
if (__n == 1)
__n = 2;
else if (__n & (__n - 1))
__n = __next_prime(__n);
I am wondering is there any useful case of this?
I known that some hash-table implementation can use a power of two as size to avoid modulo calculations. But that seems make sense only if it is known at compile-time that only power-of-two numbers can be used.
Testing Demo: https://godbolt.org/z/W5joovP4q
#include <cstdio>
#include <unordered_set>
bool is_prime(int x) {
if (x < 2) return false;
if (x % 2 == 0) return x == 2;
for (int i = 3; i * i <= x; i = 2)
if (x % i == 0) return false;
return true;
}
int next_prime(int x) {
while (!is_prime(x)) x;
return x;
}
__attribute__((noinline))
void test(int i) {
std::unordered_set<char> s;
s.reserve(i);
int a = s.bucket_count();
int b = next_prime(i);
if (a != b) {
printf("m m m\n", i, a, b);
}
}
int main() {
for (int i = 0; i < 3000; i)
test(i);
}
Program stdout:
0 0 2
4 4 5
8 8 11
16 16 17
32 32 37
64 64 67
128 128 131
256 256 257
512 512 521
1024 1024 1031
2048 2048 2053
CodePudding user response:
The original commit message sheds some light on this. In short, libc originally used just prime numbers. This commit introduces an optimization for power-of-2 numbers in case the user explicitly requests them.
Also note that the new function __constrain_hash()
in that commit checks if it is a power-of-2 and then does not use the modulo operation. According to the commit message, the cost for the additional check if the input is a power-of-2 number is outweighed by not using a modulo operation. So even if you do not know the information at compile time, you can get a performance boost.
Quote of the commit message:
This commit establishes a new bucket_count policy in the unordered containers: The policy now allows a power-of-2 number of buckets to be requested (and that request honored) by the client. And if the number of buckets is set to a power of 2, then the constraint of the hash to the number of buckets uses & instead of %. If the client does not specify a number of buckets, then the policy remains unchanged: a prime number of buckets is selected. The growth policy is that the number of buckets is roughly doubled when needed. While growing, either the prime, or the power-of-2 strategy will be preserved. There is a small run time cost for putting in this switch. For very cheap hash functions, e.g. identity for int, the cost can be as high as 18%. However with more typical use cases, e.g. strings, the cost is in the noise level. I've measured cases with very cheap hash functions (int) that using a power-of-2 number of buckets can make look up about twice as fast. However I've also noted that a power-of-2 number of buckets is more susceptible to accidental catastrophic collisions. Though I've also noted that accidental catastrophic collisions are also possible when using a prime number of buckets (but seems far less likely). In short, this patch adds an extra tuning knob for those clients trying to get the last bit of performance squeezed out of their hash containers. Casual users of the hash containers will not notice the introduction of this tuning knob. Those clients who swear by power-of-2 hash containers can now opt-in to that strategy. Clients who prefer a prime number of buckets can continue as they have.
CodePudding user response:
The commit that introduces this change is here: https://github.com/llvm/llvm-project/commit/4cb38a82a26cb3f4ac86834659e0af5f873c1d5b
It has a long commit message, but it is about deliberately honoring a power-of-two bucket count.
If you take a look at the changes, if the bucket_count is a power of two, the implementation uses &
instead of %
. It's not as efficient as having this at compile time, but the commit message claims it is not as expensive as it looks for complex hash functions.
size_t
__constrain_hash(size_t __h, size_t __bc)
{
return !(__bc & (__bc - 1)) ? __h & (__bc - 1) : __h % __bc;
}