Home > Software design >  Is there a hash algorithm that can efficiently calculate hashes of concatenated data?
Is there a hash algorithm that can efficiently calculate hashes of concatenated data?

Time:12-18

Say we have a hash function H, and two byte strings a and b (might be long, e.g. seveal MiBs in size, so we want to avoid hashing them again). We already know the value of H(a) and H(b), and want to calculate H(a b) (The hash of two strings concatenated together).

We'd like to have a function F that can calculate H(a b) from H(a), H(b) and any other properties of a and b we can calculate beforehand (e.g. lengths), and takes less time than just hashing the whole string.

The hash function H doesn't need to be cryptographic, but should be good enough for HashMaps or similar usages.

Does such functions H and F exist? Or what should I search/research for if I'd like to know that?

CodePudding user response:

Java's string hash is s[0]*31^(n-1) s[1]*31^(n-2) ... s[n-1] (modulo int size).

A property of this hash is that, H(a b) = (31^b.length())*H(a) H(b). You can compute 31^b.length() using exponentiation by squaring in logarithmic time. If you wish to precompute, you can precompute 31^length for each of your strings and store it with the precomputed hash.

CodePudding user response:

You need to tell more about what you expect from your hash function. Otherwise my answer is : use bit parity as a Hash because B(a b)=(B(a) B(b)) mod 2

  • Related