I have functionality in my app that should replace some text in json (I have simplified it in the example). Their replacement may contain escaping sequences like \n \b \t
etc. which can break the json string when I try to build json with Jackson. So I decided to use Apache's solution - StringEscapeUtils.escapeJava()
to escape all escaping sequences. But
Matcher.replaceAll()
removes backslashes which added by escapeJava()
There is the code:
public static void main(String[] args) {
String json = "{\"test2\": \"Hello toReplace \\\"test\\\" world\"}";
String replacedJson = Pattern.compile("toReplace")
.matcher(json)
.replaceAll(StringEscapeUtils.escapeJava("replacement \n \b \t"));
System.out.println(replacedJson);
}
Expected Output:
{"test2": "Hello replacement \n \b \t \"test\" world"}
Actual Output:
{"test2": "Hello replacement n b t \"test\" world"}
Why does Matcher.replaceAll()
removes backslahes while System.out.println(StringEscapeUtils.escapeJava("replacement \n \b \t"));
returns correct output - replacement \n \b \t
CodePudding user response:
StringEscapeUtils.escapeJava("\n")
allows you to transform the single newline character \n
into two characters: \
and n
.
\
is a special character in pattern replacements though, from https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#replaceAll(java.lang.String):
Note that backslashes (
\
) and dollar signs ($
) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
To have them taken as literal characters, you need to escape it via Matcher.quoteReplacement
, from https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#quoteReplacement(java.lang.String):
Returns a literal replacement
String
for the specifiedString
. This method produces aString
that will work as a literal replacements
in theappendReplacement
method of theMatcher
class. TheString
produced will match the sequence of characters ins
treated as a literal sequence. Slashes (\
) and dollar signs ($
) will be given no special meaning.
So in your case:
.replaceAll(Matcher.quoteReplacement(StringEscapeUtils.escapeJava("replacement \n \b \t")))
CodePudding user response:
If you want a literal backslash in replaceAll
, you need to escape it. You can find this in the documentation here
StringEscapeUtils.escapeJava
will escape a string suitable for use in Java source code - but it won't allow you to use unescaped strings in your source code.
"replacement \n \b \t"
^ new line
^ backspace
^ tab
If you want literal backslashes in a regular Java string, you need:
"replacement \\n \\b \\t"
Because this is a java string of the replace part of a regular expression for replaceAll
, you need:
"replacement \\\\n \\\\b \\\\t"
Try:
String replacedJson = Pattern.compile("toReplace")
.matcher(json)
.replaceAll("replacement \\\\n \\\\b \\\\t")
CodePudding user response:
You have to escape \
as well using Matcher.quoteReplacement()
.
public static String replaceAll(String json, String regex, String replace) {
return Pattern.compile(regex)
.matcher(json)
.replaceAll(Matcher.quoteReplacement(StringEscapeUtils.escapeJava(replace)));
}