Home > other >  Check String against list of characters and replace it dynamically - Regex
Check String against list of characters and replace it dynamically - Regex

Time:01-10

I'm trying to find a solution for this matter. I have list of characters that needs to be replaced with particular character that is mapped with the original character.

Ex : I have character map which hold the characters and it's replacement value. Character Map :

Map<String, String> characterMap = new HashMap<String, String>();
characterMap.put("&", "\x26");
characterMap.put("^", "\x5e");

String that needs to be replaced : String hello = "Hello& World^"; I want replace hello string with the values in the map. This map is created from the property file and it is dynamic.

Can I achieve this by a regex ? Can I achieve this without iterating the character map ?

CodePudding user response:

You may use this code:

Map<String, String> characterMap = new HashMap<>();
characterMap.put("&", "\\x26");
characterMap.put("^", "\\x5e");

String hello = "Hello& World^"; 

Pattern.compile("\\W").matcher(hello).replaceAll(
   m -> characterMap.getOrDefault(m.group(), m.group())
        .replaceAll("\\\\", "$0$0"));

Output:

"Hello\\x26 World\\x5e"

Details:

  • In main regex we match \\W which will match any non-word
  • We extract value of each matched non-word character from characterMap or if that key is not found we get same character back.
  • We call .replaceAll("\\\\", "$0$0") on extracted value to get right escaping (assuming values are just using single escaping). $0 is the complete string we match in regex here which is \\\\ and by using $0$0 we make it \\\\\\\\.

Code Demo


Another optimized way of doing this is to construct regex using keys of your map like this:

Pattern p = Pattern.compile(characterMap.keySet().stream()
   .map(s -> Pattern.quote(s)).collect(Collectors.joining("|")));

// then use it with . getOrDefault
p.matcher(hello).replaceAll(m -> 
   characterMap.get(m.group()).replaceAll("\\\\", "$0$0"));
// => "Hello\\x26 World\\x5e"

CodePudding user response:

The backslash must be escaped, resulting in two backslashes \\; both in a java string literal, and in a .properties file.

    Map<String, String> characterMap = new LinkedHashMap<>(); // Linked = keep order of inserting.
    characterMap.put("&", "\\x26");
    characterMap.put("^", "\\x5e");

But you mentioned that the source were a file.

    ResourceBundle bundle = ResourceBundle.getBundle("charmap");

    String hello = "Hello& World^";

    System.out.println("Replaced: "   mapReplace(hello, bundle));

charmap.properties:

\\ = xx
& = \\x26
^ = \\x5e

In order for the replacements not to be replaced it is best to replace character wise.

private String mapReplace(String s, ResourceBundle characterMap) {
    Pattern p = Pattern.compile(".");
    Matcher m = p.matcher(s);
    return m.replaceAll(mr -> {
        String c = mr.group();
        if (characterMap.containsKey(c)) {
            c = m.quoteReplacement(characterMap.getString(c));
        }
        return c;
    });
}

The Matcher#replaceAll expects the replacement to be in regex style, backslash escaped again, dollar $ too and such. For that one can use Matcher#quoteReplacement.

Last but not least, if you only want \x.. as replacement:

private String mapReplace(String s, String keys) {
    return Pattern.compile("["   keys.replaceAll(".", "\\\\$0")   "]").matcher(s)
            .replaceAll(mr -> String.format("\\\\xx", (int)mr.group().charAt(0)));
}

hello = mapReplace(hello, "^&");

The keys.replaceAll(".", "\\\\$0") is needed to have the regex [\^\&] aka "[\\^\\&]" as -, ^ and others have special meaning in regex.

In general it would have been advisable to use \uXXXX instead of \xXX.


This task almost looks like an ASCII-to-EBCDIC conversion, seeing those two characters. That could be done differently.

Also you might be able to use URL encoding, &/^, for which exists the URLEncoder.

  •  Tags:  
  • Related