Home > front end >  Why Java compiled regex works slower then interpreted in String::split?
Why Java compiled regex works slower then interpreted in String::split?

Time:07-20

I'm trying to improve the following code:

    public int applyAsInt(String ipAddress) {
        var ipAddressInArray = ipAddress.split("\\.");
        ...

So I compile the regular expression into a static constant:

    private static final Pattern PATTERN_DOT = Pattern.compile(".", Pattern.LITERAL);

    public int applyAsInt(String ipAddress) {
        var ipAddressInArray = PATTERN_DOT.split(ipAddress);
        ...

The rest of the code remained unchanged.

To my amazement, the new code is slower than the previous one. Below are the test results:

Benchmark                                (ipAddress)  Mode  Cnt    Score    Error  Units
ConverterBenchmark.mkyongConverter           1.2.3.4  avgt   10  166.456 ±  9.087  ns/op
ConverterBenchmark.mkyongConverter       120.1.34.78  avgt   10  168.548 ±  2.996  ns/op
ConverterBenchmark.mkyongConverter   129.205.201.114  avgt   10  180.754 ±  6.891  ns/op
ConverterBenchmark.mkyong2Converter          1.2.3.4  avgt   10  253.318 ±  4.977  ns/op
ConverterBenchmark.mkyong2Converter      120.1.34.78  avgt   10  263.045 ±  8.373  ns/op
ConverterBenchmark.mkyong2Converter  129.205.201.114  avgt   10  331.376 ± 53.092  ns/op

Help me understand why this is happening, please.

CodePudding user response:

String.split has code aimed at exactly this use case:

https://github.com/openjdk/jdk17u/blob/master/src/java.base/share/classes/java/lang/String.java#L3102

        /* fastpath if the regex is a
         * (1) one-char String and this character is not one of the
         *     RegEx's meta characters ".$|()[{^?* \\", or
         * (2) two-char String and the first char is the backslash and
         *     the second is not the ascii digit or ascii letter.
         */

That means that when using split("\\.") the string is effectively not split using a regular expression - the method splits the string directly at the '.' characters.

This optimization is not possible when you write PATTERN_DOT.split(ipAddress).

  • Related