I'm trying to find a solution for this matter. I have list of characters that needs to be replaced with particular character that is mapped with the original character.
Ex : I have character map which hold the characters and it's replacement value. Character Map :
Map<String, String> characterMap = new HashMap<String, String>();
characterMap.put("&", "\x26");
characterMap.put("^", "\x5e");
String that needs to be replaced : String hello = "Hello& World^";
I want replace hello string with the values in the map. This map is created from the property file and it is dynamic.
Can I achieve this by a regex ? Can I achieve this without iterating the character map ?
CodePudding user response:
You may use this code:
Map<String, String> characterMap = new HashMap<>();
characterMap.put("&", "\\x26");
characterMap.put("^", "\\x5e");
String hello = "Hello& World^";
Pattern.compile("\\W").matcher(hello).replaceAll(
m -> characterMap.getOrDefault(m.group(), m.group())
.replaceAll("\\\\", "$0$0"));
Output:
"Hello\\x26 World\\x5e"
Details:
- In main regex we match
\\W
which will match any non-word - We extract value of each matched non-word character from
characterMap
or if that key is not found we get same character back. - We call
.replaceAll("\\\\", "$0$0")
on extracted value to get right escaping (assuming values are just using single escaping).$0
is the complete string we match in regex here which is\\\\
and by using$0$0
we make it\\\\\\\\
.
Another optimized way of doing this is to construct regex using keys of your map like this:
Pattern p = Pattern.compile(characterMap.keySet().stream()
.map(s -> Pattern.quote(s)).collect(Collectors.joining("|")));
// then use it with . getOrDefault
p.matcher(hello).replaceAll(m ->
characterMap.get(m.group()).replaceAll("\\\\", "$0$0"));
// => "Hello\\x26 World\\x5e"
CodePudding user response:
The backslash must be escaped, resulting in two backslashes \\
; both in a java string literal, and in a .properties file.
Map<String, String> characterMap = new LinkedHashMap<>(); // Linked = keep order of inserting.
characterMap.put("&", "\\x26");
characterMap.put("^", "\\x5e");
But you mentioned that the source were a file.
ResourceBundle bundle = ResourceBundle.getBundle("charmap");
String hello = "Hello& World^";
System.out.println("Replaced: " mapReplace(hello, bundle));
charmap.properties:
\\ = xx
& = \\x26
^ = \\x5e
In order for the replacements not to be replaced it is best to replace character wise.
private String mapReplace(String s, ResourceBundle characterMap) {
Pattern p = Pattern.compile(".");
Matcher m = p.matcher(s);
return m.replaceAll(mr -> {
String c = mr.group();
if (characterMap.containsKey(c)) {
c = m.quoteReplacement(characterMap.getString(c));
}
return c;
});
}
The Matcher#replaceAll
expects the replacement to be in regex style, backslash escaped again, dollar $
too and such.
For that one can use Matcher#quoteReplacement
.
Last but not least, if you only want \x..
as replacement:
private String mapReplace(String s, String keys) {
return Pattern.compile("[" keys.replaceAll(".", "\\\\$0") "]").matcher(s)
.replaceAll(mr -> String.format("\\\\xx", (int)mr.group().charAt(0)));
}
hello = mapReplace(hello, "^&");
The keys.replaceAll(".", "\\\\$0")
is needed to have the regex [\^\&]
aka "[\\^\\&]"
as -
, ^
and others have special meaning in regex.
In general it would have been advisable to use \uXXXX
instead of \xXX
.
This task almost looks like an ASCII-to-EBCDIC conversion, seeing those two characters. That could be done differently.
Also you might be able to use URL encoding, &
/^
, for which exists the URLEncoder.