Home > other >  Matcher.replaceAll() removes backslash even when I escape it. Java
Matcher.replaceAll() removes backslash even when I escape it. Java

Time:01-21

I have functionality in my app that should replace some text in json (I have simplified it in the example). Their replacement may contain escaping sequences like \n \b \t etc. which can break the json string when I try to build json with Jackson. So I decided to use Apache's solution - StringEscapeUtils.escapeJava() to escape all escaping sequences. But Matcher.replaceAll() removes backslashes which added by escapeJava()

There is the code:

public static void main(String[] args) {
    String json = "{\"test2\": \"Hello toReplace \\\"test\\\" world\"}";

    String replacedJson = Pattern.compile("toReplace")
            .matcher(json)
            .replaceAll(StringEscapeUtils.escapeJava("replacement \n \b \t"));

    System.out.println(replacedJson);
}

Expected Output:

{"test2": "Hello replacement \n \b \t \"test\" world"}

Actual Output:

{"test2": "Hello replacement n b t \"test\" world"}

Why does Matcher.replaceAll() removes backslahes while System.out.println(StringEscapeUtils.escapeJava("replacement \n \b \t")); returns correct output - replacement \n \b \t

CodePudding user response:

StringEscapeUtils.escapeJava("\n") allows you to transform the single newline character \n into two characters: \ and n.

\ is a special character in pattern replacements though, from https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#replaceAll(java.lang.String):

Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.

To have them taken as literal characters, you need to escape it via Matcher.quoteReplacement, from https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#quoteReplacement(java.lang.String):

Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class. The String produced will match the sequence of characters in s treated as a literal sequence. Slashes (\) and dollar signs ($) will be given no special meaning.

So in your case:

.replaceAll(Matcher.quoteReplacement(StringEscapeUtils.escapeJava("replacement \n \b \t")))

CodePudding user response:

If you want a literal backslash in replaceAll, you need to escape it. You can find this in the documentation here

StringEscapeUtils.escapeJava will escape a string suitable for use in Java source code - but it won't allow you to use unescaped strings in your source code.

"replacement \n \b \t"
             ^ new line
                 ^ backspace
                    ^ tab

If you want literal backslashes in a regular Java string, you need:

"replacement \\n \\b \\t"

Because this is a java string of the replace part of a regular expression for replaceAll, you need:

"replacement \\\\n \\\\b \\\\t"

Try:

    String replacedJson = Pattern.compile("toReplace")
            .matcher(json)
            .replaceAll("replacement \\\\n \\\\b \\\\t")

CodePudding user response:

You have to escape \ as well using Matcher.quoteReplacement().

public static String replaceAll(String json, String regex, String replace) {
    return Pattern.compile(regex)
                  .matcher(json)
                  .replaceAll(Matcher.quoteReplacement(StringEscapeUtils.escapeJava(replace)));
}
  •  Tags:  
  • Related