Looking into UTF8 decoding performance, I noticed the performance of protobuf's UnsafeProcessor::decodeUtf8
is better than String(byte[] bytes, int offset, int length, Charset charset)
for the following non ascii string: "Quizdeltagerne spiste jordb\u00e6r med fl\u00f8de, mens cirkusklovnen"
.
I tried to figure out why, so I copied the relevant code in String
and replaced the array accesses with unsafe array accesses, same as UnsafeProcessor::decodeUtf8
.
Here are the JMH benchmark results:
Benchmark Mode Cnt Score Error Units
StringBenchmark.safeDecoding avgt 10 127.107 ± 3.642 ns/op
StringBenchmark.unsafeDecoding avgt 10 100.915 ± 4.090 ns/op
I assume the difference is due to missing bounds checking elimination which I expected to kick in, especially since there is an explicit bounds check in the form of a call to checkBoundsOffCount(offset, length, bytes.length)
in the beginning of String(byte[] bytes, int offset, int length, Charset charset)
.
Is the issue really a missing bounds check elimination?
Here's the code I benchmarked using OpenJDK 17 & JMH. Note that this is only part of the String(byte[] bytes, int offset, int length, Charset charset)
constructor code, and works correctly only for this specific German String.
The static methods were copied from String
.
Look for the // the unsafe version:
comments that indicate where I replaced the safe access with unsafe.
private static byte[] safeDecode(byte[] bytes, int offset, int length) {
checkBoundsOffCount(offset, length, bytes.length);
int sl = offset length;
int dp = 0;
byte[] dst = new byte[length];
while (offset < sl) {
int b1 = bytes[offset];
// the unsafe version:
// int b1 = UnsafeUtil.getByte(bytes, offset);
if (b1 >= 0) {
dst[dp ] = (byte)b1;
offset ;
continue;
}
if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) &&
offset 1 < sl) {
// the unsafe version:
// int b2 = UnsafeUtil.getByte(bytes, offset 1);
int b2 = bytes[offset 1];
if (!isNotContinuation(b2)) {
dst[dp ] = (byte)decode2(b1, b2);
offset = 2;
continue;
}
}
// anything not a latin1, including the repl
// we have to go with the utf16
break;
}
if (offset == sl) {
if (dp != dst.length) {
dst = Arrays.copyOf(dst, dp);
}
return dst;
}
return dst;
}
Followup
Apparently if I change the while loop condition from offset < sl
to 0 <= offset && offset < sl
I get similar performance in both versions:
Benchmark Mode Cnt Score Error Units
StringBenchmark.safeDecoding avgt 10 100.802 ± 13.147 ns/op
StringBenchmark.unsafeDecoding avgt 10 102.774 ± 3.893 ns/op
CodePudding user response:
To measure the branch you are interested in and particularly the scenario when while
loop becomes hot, I've used the following benchmark:
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class StringConstructorBenchmark {
private byte[] array;
@Setup
public void setup() {
String str = "Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen. Я";
array = str.getBytes(StandardCharsets.UTF_8);
}
@Benchmark
public String newString() {
return new String(array, 0, array.length, StandardCharsets.UTF_8);
}
}
And indeed, with modified constructor it does give significant improvement:
//baseline
Benchmark Mode Cnt Score Error Units
StringConstructorBenchmark.newString avgt 50 173,092 ± 3,048 ns/op
//patched
Benchmark Mode Cnt Score Error Units
StringConstructorBenchmark.newString avgt 50 126,908 ± 2,355 ns/op
This is likely to be a HotSpot issue: optimizing compiler for some reason failed to eliminate array bounds check within while
-loop. I guess the reason is that offset
is modified within the loop:
while (offset < sl) {
int b1 = bytes[offset];
if (b1 >= 0) {
dst[dp ] = (byte)b1;
offset ; // <---
continue;
}
if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) &&
offset 1 < sl) {
int b2 = bytes[offset 1];
if (!isNotContinuation(b2)) {
dst[dp ] = (byte)decode2(b1, b2);
offset = 2;
continue;
}
}
// anything not a latin1, including the repl
// we have to go with the utf16
break;
}
Also I've looked into the code via LinuxPerfAsmProfiler
, here's the link for baseline https://gist.github.com/stsypanov/d2524f98477d633fb1d4a2510fedeea6 and this one is for patched constructor https://gist.github.com/stsypanov/16c787e4f9fa3dd122522f16331b68b7
What should one pay attention to? Let's find the code corresponding int b1 = bytes[offset];
(line 538). In baseline we have this:
3.62% ││ │ 0x00007fed70eb4c1c: mov