Home > database >  Rabin Karp Algorithm Negative Hash
Rabin Karp Algorithm Negative Hash

Time:12-15

I have this Rabin Karp implementation. Now the only thing I'm doing for rolling hash is subtract power*source[i] from the sourceHash. power is 31^target.size()-1 % mod But I can't understand why we're adding mod to sourceHash when it becomes negative. I have tried adding other values but it doesn't work and it only works when we add mod. Why is this? Is there a specific reason why we're adding mod and not anything else (like a random big number for example).

int rbk(string source, string target){
        int m = target.size();
        int n = source.size();
        int mod = 128;
        int prime = 11;
        int power = 1;
        int targetHash = 0, sourceHash = 0;
        for(int i = 0; i < m - 1; i  ){
            power =(power*prime) % mod;
        }
        for(int i = 0; i < target.size(); i  ){
            sourceHash = (sourceHash*prime   source[i]) % mod;
            targetHash = (targetHash*prime   target[i]) % mod;
        }
        
        for(int i = 0; i < n-m 1; i  ){
            if(targetHash == sourceHash){
                bool flag = true;
                for(int j = 0; j < m; j  ){
                    if(source[i j] != target[j]){
                        flag = false;
                        break;
                    }
                }
                if(flag){
                    return 1;
                }
            }
            
            if(i < n-m){
                sourceHash = (prime*(sourceHash - source[i]*power)   source[i m]) % mod;
                if(sourceHash < 0){
                    sourceHash  = mod;
                }
            }
        }
        return -1;
}

CodePudding user response:

When using modulo arithmetics (mod n) we have just n distinct numbers: 0, 1, 2, ..., n - 1. All the other numbers which out of 0 .. n - 1 are equal to some number in 0 .. n - 1:

-n     ~ 0
-n   1 ~ 1
-n   2 ~ 2
 ...
-2     ~ n - 2
-1     ~ n - 1
   

or

 n     ~ 0
 n   1 ~ 1
 n   2 ~ 2
 ...
 2 * n     ~ 0
 2 * n   1 ~ 0

In general case A ~ B if and only if (A - B) % n = 0 (here % stands for remainder).

When implementing Rabin Karp algorithm we can have two potential problems:

  1. Hash can be too large, we can face integer overflow
  2. Negative remainder can be implemented in different way on different compilers: -5 % 3 == -2 == 1

To deal with both problems, we can normalize remainder and operate with numbers within safe 0 .. n - 1 range only. For arbitrary value A we can put

 A = (A % n   n) % n;
  • Related