Why do Hashtable class have not all Prime Numbers? So basic prime numbers have those numbers https://en.wikipedia.org/wiki/Prime_number
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199
But code in Hashtable.cs from C# library have this
// Table of prime numbers to use as hash table sizes.
// A typical resize algorithm would pick the smallest prime number in this array
// that is larger than twice the previous capacity.
// Suppose our Hashtable currently has capacity x and enough elements are added
// such that a resize needs to occur. Resizing first computes 2x then finds the
// first prime in the table greater than 2x, i.e. if primes are ordered
// p_1, p_2, ..., p_i, ..., it finds p_n such that p_n-1 < 2x < p_n.
// Doubling is important for preserving the asymptotic complexity of the
// hashtable operations such as add. Having a prime guarantees that double
// hashing does not lead to infinite loops. IE, your hash function will be
// h1(key) i*h2(key), 0 <= i < size. h2 and the size must be relatively prime.
public static readonly int[] primes = {
3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369};
I don't quite understand why it's working like this?
EDIT 1: I mean, I am not talking about why is it so small. Why in this array there is no 5, or 13. For example: if we 7*2 = 14, why the closest one 11, but not 13?
CodePudding user response:
These prime numbers are used to determine the size of the hash table. E.g., we could make it 197 buckets or slots long. But it makes no sense to differentiate between sizes of 197 and 199, since hash tables are at least longer by roughly about 30% than their item count anyway to avoid having too many key collisions. And, as the comments says, "Resizing first computes 2x then finds the first prime in the table greater than 2x". So, finer grained steps offer no advantage.
Therefore, they have chosen a more economic approach. For bigger primes, two successive primes in this list differ by about 20%. There is even a bigger step for smaller primes. But this has only a very small impact on memory usage.
{ 3, 7}, delta % = 133.33
{ 7, 11}, delta % = 57.14
{ 11, 17}, delta % = 54.55
{ 17, 23}, delta % = 35.29
{ 23, 29}, delta % = 26.09
{ 29, 37}, delta % = 27.59
{ 37, 47}, delta % = 27.03
{ 47, 59}, delta % = 25.53
{ 59, 71}, delta % = 20.34
{ 71, 89}, delta % = 25.35
{ 89, 107}, delta % = 20.22
{ 107, 131}, delta % = 22.43
{ 131, 163}, delta % = 24.43
{ 163, 197}, delta % = 20.86
{ 197, 239}, delta % = 21.32
{ 239, 293}, delta % = 22.59
{ 293, 353}, delta % = 20.48
{ 353, 431}, delta % = 22.10
{ 431, 521}, delta % = 20.88
{ 521, 631}, delta % = 21.11
{ 631, 761}, delta % = 20.60
{ 761, 919}, delta % = 20.76
{ 919, 1103}, delta % = 20.02
{ 1103, 1327}, delta % = 20.31
{ 1327, 1597}, delta % = 20.35
{ 1597, 1931}, delta % = 20.91
{ 1931, 2333}, delta % = 20.82
{ 2333, 2801}, delta % = 20.06
{ 2801, 3371}, delta % = 20.35
{ 3371, 4049}, delta % = 20.11
{ 4049, 4861}, delta % = 20.05
{ 4861, 5839}, delta % = 20.12
{ 5839, 7013}, delta % = 20.11
{ 7013, 8419}, delta % = 20.05
{ 8419, 10103}, delta % = 20.00
{ 10103, 12143}, delta % = 20.19
{ 12143, 14591}, delta % = 20.16
{ 14591, 17519}, delta % = 20.07
{ 17519, 21023}, delta % = 20.00
{ 21023, 25229}, delta % = 20.01
{ 25229, 30293}, delta % = 20.07
{ 30293, 36353}, delta % = 20.00
{ 36353, 43627}, delta % = 20.01
{ 43627, 52361}, delta % = 20.02
{ 52361, 62851}, delta % = 20.03
{ 62851, 75431}, delta % = 20.02
{ 75431, 90523}, delta % = 20.01
{ 90523, 108631}, delta % = 20.00
{ 108631, 130363}, delta % = 20.01
{ 130363, 156437}, delta % = 20.00
{ 156437, 187751}, delta % = 20.02
{ 187751, 225307}, delta % = 20.00
{ 225307, 270371}, delta % = 20.00
{ 270371, 324449}, delta % = 20.00
{ 324449, 389357}, delta % = 20.01
{ 389357, 467237}, delta % = 20.00
{ 467237, 560689}, delta % = 20.00
{ 560689, 672827}, delta % = 20.00
{ 672827, 807403}, delta % = 20.00
{ 807403, 968897}, delta % = 20.00
{ 968897, 1162687}, delta % = 20.00
{1162687, 1395263}, delta % = 20.00
{1395263, 1674319}, delta % = 20.00
{1674319, 2009191}, delta % = 20.00
{2009191, 2411033}, delta % = 20.00
{2411033, 2893249}, delta % = 20.00
{2893249, 3471899}, delta % = 20.00
{3471899, 4166287}, delta % = 20.00
{4166287, 4999559}, delta % = 20.00
{4999559, 5999471}, delta % = 20.00
{5999471, 7199369}, delta % = 20.00
CodePudding user response:
If you consider all the primes below or equals to 7199369 you have an array with over 44.000 entries. How much time does it take to find p_n over an array so big? (even using a binary search). How much memory are you going to use? Is it worth to optimize a bit the memory usage of the hashtables? I suppose the person writing considered that 72 sizes was a good enough tradeoff. To answer at the clarified question, they choose a growth factor of the hashtable capacity (20%) and searched, given a prime in the array, another prime roughly 20% bigger.