Is there a known Java String with hashCode exactly equal to Integer.MIN_VALUE ? It would be helpful for writing a test for a hash table to help avoid a common mistake of running Math.Abs on the hashcode before performing the remainder operation.
Ideally the string would include only ASCII characters, but I'm not sure if it woul dbe feasible.
CodePudding user response:
Based on the formula for hash code (from StringLatin1
):
public static int hashCode(byte[] value) {
int h = 0;
for (byte v : value) {
h = 31 * h (v & 0xff);
}
return h;
}
it depends linearly on the characters, the longer the string and greater the characters, the greater the hash code will be, until it overflows. Also note that the first characters have a greater impact on the resulting hash code (more often multiplied by 31).
The first code starts testing the strings "A", "AA", "AAA", ...
until one starts returning a negative value - the previous string is used as starting value.
Now it starts incrementing the first character up to Z
or until a string with a negative hash is found. The same is done for every next character.
Since the hash code of such strings was not yet reaching Integer.MIN_VALUE
, an additional pass is done, to also test lowercase characters. This should have been integrated in the previous loop...
Now the last character is adjusted to exactly get to Integer.MIN_VALUE
- simple since the last character is just added, without multiplication to calculate the hash code.
Here the code:
var string = "A";
while ((string "A").hashCode() > 0) {
string = "A";
}
var array = string.toCharArray();
var i = 0;
while (i < array.length) {
array[i] = 1;
if (array[i] > 'z' || new String(array).hashCode() < 0) {
array[i] -= 1;
i = 1;
continue;
}
}
i = 1;
while (i < array.length) {
if (array[i] == 'Z') {
array[i] = 'a';
}else {
array[i] = 1;
}
if (array[i] > 'Z' || new String(array).hashCode() < 0) {
if (array[i] == 'a')
array[i] = 'Z';
else
array[i] -= 1;
i = 1;
continue;
}
}
int hash = new String(array).hashCode();
if (hash > 0) {
array[array.length-1] = Integer.MAX_VALUE - hash 1;
}
System.out.printf("%s = %d%n", new String(array), new String(array).hashCode());
This results in:
HZcxf_ = -2147483648
Merging the two incrementing loops of previous code, we have:
var string = "A";
while ((string "A").hashCode() > 0) {
string = "A";
}
var array = string.toCharArray();
var i = 0;
while (i < array.length) {
var prev = array[i];
if (prev == 'Z') {
array[i] = 'a';
} else {
array[i] = 1;
}
if (array[i] > 'z' || new String(array).hashCode() < 0) {
array[i] = prev;
i = 1;
continue;
}
}
int hash = new String(array).hashCode();
if (hash > 0) {
array[array.length-1] = Integer.MAX_VALUE - hash 1;
}
System.out.printf("%s = %d%n", new String(array), new String(array).hashCode());
Resulting in (slightly different than previous):
HZdZG_ = -2147483648
Another method would be more strongly based on the hash calculation, basically undoing it.
Since I did not want to work with negative number, it starts with Integer.MAX_VALUE
, which is one less than Integer.MIN_VALUE
(considering over/underflow).
First it finds out how often it must be divided by 31
until the result is less than 128 (ASCII), kind of determining the string length.
Next it loops and finds out each character with some special handling to avoid characters less than ' '.
At the end, the last character is incremented by one to move the hash code from MAX_VALUE
to MIN_VALUE
by overflowing.
var string = "";
var remain = Integer.MAX_VALUE;
var i = 0;
var multiplier = 1;
while (remain > 127) {
remain /= 31;
multiplier *= 31;
i = 1;
}
remain = Integer.MAX_VALUE;
while (i >= 0) {
var ch = (char)(remain / multiplier);
remain -= ch * multiplier;
multiplier /= 31;
if (i > 0) {
// correct if next ch will be less than ' '
var correct = (' ' - (remain / multiplier) 30) / 31; // old fashion rounding
if (correct > 0) {
ch -= correct;
remain = correct * 31 * multiplier;
}
} else {
ch = 1;
}
string = ch;
i -= 1;
}
System.out.printf("%s = %d%n", string, string.hashCode());
And its results:
I='<*! = -2147483648
Note: the last code will definitively fail if the hash code algorithm of String
is changed! The previous two may fail, depending on how the hash calculation is changed.
CodePudding user response:
String#hashCode()
is defined as:
Returns a hash code for this string. The hash code for a String object is computed as
s[0]*31^(n-1) s[1]*31^(n-2) ... s[n-1]
using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
Now you just need to solve for -2147483648
(perhaps with the restriction of only printable ASCII chars: 32–127) :)
Or you brute-force (this will take a while):
public class HashFinder {
private static final int SIZE = 7;
private static long hashesCalculated = 0L;
public static void main(String[] args) {
hashesCalculated = 0L;
final long start = System.nanoTime();
findHash(SIZE);
final long duration = System.nanoTime() - start;
System.err.println("Checked strings of size " SIZE);
System.err.println(hashesCalculated " hashes in " TimeUnit.NANOSECONDS.toSeconds(duration) "s");
}
public static void findHash(final int size) {
findHash("", size);
}
public static void findHash(final String prefix, final int size) {
if (size <= 0) {
return;
}
final StringBuilder sb = new StringBuilder(prefix).append(' ');
for (char c = ' '; c < '~'; c) {
sb.setCharAt(prefix.length(), c);
final String s = sb.toString();
hashesCalculated;
if (s.hashCode() == Integer.MIN_VALUE) {
System.out.printf("Found string with min hashCode! '%s'%n", s);
}
findHash(s, size - 1);
}
}
}