Home > Software engineering >  Why is the hash function O(1)
Why is the hash function O(1)

Time:07-12

Why does finding the hash of a given string only run in constant time?

I am trying to write an optimized program to compare two strings by using string hashing.

From what I know, the hash of a string is usually defined by a polynomial rolling hash function. Online sources say calculating this hash and doing the comparison is O(1). For example, https://cp-algorithms.com/string/string-hashing.html says that

The idea behind the string hashing is the following: we map each string into an integer and compare those instead of the strings. Doing this allows us to reduce the execution time of the string comparison to O(1).

However, the actual implementation (according to https://cp-algorithms.com/string/string-hashing.html) of this hash function includes looping through the entire string:

long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9   9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
    hash_value = (hash_value   (c - 'a'   1) * p_pow) % m;
    p_pow = (p_pow * p) % m;
}
return hash_value;
}

Wouldn't this guarantee linear time complexity to compare two strings, instead of O(1)? Any help is appreciated!

CodePudding user response:

Big-O notation is a way of describing how the execution time of an algorithm will grow with the size of the data-set it is working on. In order for that definition to be applied, we have to specify what the data-set is that will be growing.

In most cases that's obvious, but sometimes it's a bit ambiguous, and this is one of those cases. Does "data set" in this case refer to an individual string? If so, then the algorithm is O(N), since the sequence of operations it has to perform grows linearly with the size of the string. But if by "data set" you mean the number of items in the associated hash table, then the algorithm can be considered O(1), since it will operate just as quickly (for any given string) when dealing with a million-entry Hashtable as it would for a two-entry Hashtable.

Regarding using hashing to compare two strings: once you have computed the hash values for the two strings, and stored those hash values somewhere, you can compare the two hash values in O(1) time (since comparing two integers is a fixed-cost operation, regardless of the sizes of the strings they were calculated from). It's important to note, however, that this comparison can only tell you if the two strings are different -- if the two hash values are equal, you still can't be 100% certain that the two strings are equal, since (due to the pigeonhole principle) two different strings can hash to the same value. So in that case you'd still need to compare the strings the regular O(N)/char-by-char way.

  • Related