As I was looking through some string hash fucntions, I came across this one (code below). The function processes the string four bytes at a time, and interprets each of the four-byte chunks as a single long integer value. The integer values for the four-byte chunks are added together. In the end, the resulting sum is converted to the range 0 to M-1 using the modulus operator. The following is the function code :
// Use folding on a string, summed 4 bytes at a time
long sfold(String s, int M) {
int intLength = s.length() / 4;
long sum = 0;
for (int j = 0; j < intLength; j ) {
char c[] = s.substring(j * 4, (j * 4) 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k ) {
sum = c[k] * mult;
mult *= 256;
}
}
char c[] = s.substring(intLength * 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k ) {
sum = c[k] * mult;
mult *= 256;
}
return(Math.abs(sum) % M);
}
The confusion for me is this chunk of code, especially the first line.
char c[] = s.substring(intLength * 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k ) {
sum = c[k] * mult;
mult *= 256;
To my knowledge, the substring function used in this line takes as argument : begin index inclusive, The substring will start from the specified beginIndex and it will extend to the end of the string.
For the sake of example, let's assume we want to hash the following string : aaaabbbb. In this case intLength is going to be 2 (second line of function code). Replacing the value of intlength in s.substring(intLength * 4).toCharArray()
will give us s.substring(8).toCharArray()
which means string index is out of bounds given the string to be hashed has 8 characters.
I don't quite understand what's going on !
CodePudding user response:
This hash function is awful, but to answer your question:
There is no IndexOutOfBoundsException
, because "aaaabbbb".substring(8)
is ""
The purpose of that last loop is to deal with leftovers when the string length isn't a multiple of 4. When s
is "aaaabbbbcc"
, for example, then intLength == 2
, and s.substring(8)
is "cc"
.