Home > Enterprise >  Missing bounds checking elimination in String constructor?
Missing bounds checking elimination in String constructor?

Time:12-11

Looking into UTF8 decoding performance, I noticed the performance of protobuf's UnsafeProcessor::decodeUtf8 is better than String(byte[] bytes, int offset, int length, Charset charset) for the following non ascii string: "Quizdeltagerne spiste jordb\u00e6r med fl\u00f8de, mens cirkusklovnen".

I tried to figure out why, so I copied the relevant code in String and replaced the array accesses with unsafe array accesses, same as UnsafeProcessor::decodeUtf8. Here are the JMH benchmark results:

Benchmark                       Mode  Cnt    Score   Error  Units
StringBenchmark.safeDecoding    avgt   10  127.107 ± 3.642  ns/op
StringBenchmark.unsafeDecoding  avgt   10  100.915 ± 4.090  ns/op

I assume the difference is due to missing bounds checking elimination which I expected to kick in, especially since there is an explicit bounds check in the form of a call to checkBoundsOffCount(offset, length, bytes.length) in the beginning of String(byte[] bytes, int offset, int length, Charset charset).

Is the issue really a missing bounds check elimination?

Here's the code I benchmarked using OpenJDK 17 & JMH. Note that this is only part of the String(byte[] bytes, int offset, int length, Charset charset) constructor code, and works correctly only for this specific German String. The static methods were copied from String. Look for the // the unsafe version: comments that indicate where I replaced the safe access with unsafe.

    private static byte[] safeDecode(byte[] bytes, int offset, int length) {
        checkBoundsOffCount(offset, length, bytes.length);
        int sl = offset   length;
        int dp = 0;
        byte[] dst = new byte[length];
        while (offset < sl) {
            int b1 = bytes[offset];
            // the unsafe version:
            // int b1 = UnsafeUtil.getByte(bytes, offset);
            if (b1 >= 0) {
                dst[dp  ] = (byte)b1;
                offset  ;
                continue;
            }
            if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) &&
                    offset   1 < sl) {
                // the unsafe version:
                // int b2 = UnsafeUtil.getByte(bytes, offset   1);
                int b2 = bytes[offset   1];
                if (!isNotContinuation(b2)) {
                    dst[dp  ] = (byte)decode2(b1, b2);
                    offset  = 2;
                    continue;
                }
            }
            // anything not a latin1, including the repl
            // we have to go with the utf16
            break;
        }
        if (offset == sl) {
            if (dp != dst.length) {
                dst = Arrays.copyOf(dst, dp);
            }
            return dst;
        }

        return dst;
    }

Followup

Apparently if I change the while loop condition from offset < sl to 0 <= offset && offset < sl I get similar performance in both versions:

Benchmark                       Mode  Cnt    Score    Error  Units
StringBenchmark.safeDecoding    avgt   10  100.802 ± 13.147  ns/op
StringBenchmark.unsafeDecoding  avgt   10  102.774 ± 3.893  ns/op

CodePudding user response:

To measure the branch you are interested in and particularly the scenario when while loop becomes hot, I've used the following benchmark:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class StringConstructorBenchmark {
  private byte[] array;

  @Setup
  public void setup() {
    String str = "Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen. Я";
    array = str.getBytes(StandardCharsets.UTF_8);
  }

  @Benchmark
  public String newString()  {
      return new String(array, 0, array.length, StandardCharsets.UTF_8);
  }
}

And indeed, with modified constructor it does give significant improvement:

//baseline
Benchmark                             Mode  Cnt    Score   Error  Units
StringConstructorBenchmark.newString  avgt   50  173,092 ± 3,048  ns/op

//patched
Benchmark                             Mode  Cnt    Score   Error  Units
StringConstructorBenchmark.newString  avgt   50  126,908 ± 2,355  ns/op

This is likely to be a HotSpot issue: optimizing compiler for some reason failed to eliminate array bounds check within while-loop. I guess the reason is that offset is modified within the loop:

while (offset < sl) {
    int b1 = bytes[offset];
    if (b1 >= 0) {
        dst[dp  ] = (byte)b1;
        offset  ;                     // <---
        continue;
    }
    if ((b1 == (byte)0xc2 || b1 == (byte)0xc3) &&
            offset   1 < sl) {
        int b2 = bytes[offset   1];
        if (!isNotContinuation(b2)) {
            dst[dp  ] = (byte)decode2(b1, b2);
            offset  = 2;
            continue;
        }
    }
    // anything not a latin1, including the repl
    // we have to go with the utf16
    break;
}

Also I've looked into the code via LinuxPerfAsmProfiler, here's the link for baseline https://gist.github.com/stsypanov/d2524f98477d633fb1d4a2510fedeea6 and this one is for patched constructor https://gist.github.com/stsypanov/16c787e4f9fa3dd122522f16331b68b7

What should one pay attention to? Let's find the code corresponding int b1 = bytes[offset]; (line 538). In baseline we have this:

  3.62%           ││            │  0x00007fed70eb4c1c:   mov               
  • Related