Home > OS >  Replacing all regex matches with masking characters in Java
Replacing all regex matches with masking characters in Java

Time:08-04

Java 11 here. I have a huge String that will contain 0 instances of the following "fizz token":

  • the substring "fizz"
  • followed by any integer 0
  • followed by an equals sign ("=")
  • followed by another string of any kind, a.k.a. the "fizz value"
  • terminated by the first whitespace (included tabs, newlines, etc.)

So some examples of a valid fizz token:

  1. fizz0=fj49jc49fj59
  2. fizz39=f44kk5k59
  3. fizz101023=jjj

Some examples of invalid fizz tokens:

  • fizz=9d94dj49j4 <-- missing an integer after "fizz" and before "="
  • fizz2= <-- missing a fizz value after "="

I am trying to write a Java method that will:

  • Find all instances of matching fizz tokens inside my huge input String
  • Obtain each fizz token's value
  • Replace each character of the token value with an upper-case X ("X")

So for example:

| Fizz Token         | Token Value  | Final Result       |
|--------------------|--------------|--------------------|
| fizz0=fj49jc49fj59 | fj49jc49fj59 | fizz0=XXXXXXXXXXXX |
| fizz39=f44kk5k59   | f44kk5k59    | fizz39=XXXXXXXXX   |
| fizz101023=jjj     | jjj          | fizz101023=XXX     |

I need the method to do this replacement with the token values for all fizz tokens found in the input sting, hence:

String input = "Some initial text fizz0=fj49jc49fj59 then some more fizz101023=jjj";
String masked = mask(input);

// Outputs: Some initial text fizz0=XXXXXXXXXXXX then some more fizz101023=XXX
System.out.println(masked);

My best attempt thus far is a massive WIP:

public class Masker {
    private Pattern fizzTokenPattern = Pattern.compile("fizz{d*}=*");
    public String mask(String input) {
        Matcher matcher = fizzTokenPattern.matcher(input);
        int numMatches = matcher.groupCount();
        for (int i = 0; i < numMatches; i  ) {
            // how to get the token value from the group?
            String tokenValue = matcher.group(i); // ex: fj49jc49fj59
            // how to replace each character with an X?
            // ex: fj49jc49fj59 ==> XXXXXXXXXXXX
            String masked = tokenValue.replaceAll("*", "X");
            // how to grab the original (matched) token and replace it with the new
            // 'masked' string?
            String entireTokenWithValue = input.substring(matcher.group(i));
        }
    }
}

I feel like I'm in the ballpark but missing some core concepts. Anybody have any ideas?

CodePudding user response:

According to requirements

  1. the substring "fizz"
  2. followed by any integer 0
  3. followed by an equals sign ("=")
  4. followed by another string of any kind, a.k.a. the "fizz value"
  5. terminated by the first whitespace (included tabs, newlines, etc.)

regex which fulfill it can look like

  1. fizz
  2. \d
  3. =
  4. -5. \S - one or more of any NON-whitespace characters.

which gives us "fizz\\d =\\S ".

But since you want to only modify some part of that match, and reuse other we can wrap those parts in groups like "(fizz\\d =)(\\S )". This way our replacement will need to

  • assign back what was found in "(fizz\\d =)
  • modify what was found in "(\\S )"
    • this modification is simply assigning X repeated n times where n is length of what is found in group "(\\S )".

In other words your code can look like

class Masker {

    private static Pattern p = Pattern.compile("(fizz\\d =)(\\S )");

    public static String mask(String input) {
        return p.matcher(input)
                .replaceAll(match -> match.group(1) "X".repeat(match.group(2).length()));
    }


    //DEMO
    public static void main(String[] args) throws Exception {
        String input = "Some initial text fizz0=fj49jc49fj59 then some more fizz101023=jjj";
        String masked = Masker.mask(input);
        System.out.println(input);
        System.out.println(masked);
    }
}

Output:

Some initial text fizz0=fj49jc49fj59 then some more fizz101023=jjj
Some initial text fizz0=XXXXXXXXXXXX then some more fizz101023=XXX

Version 2 - with named-groups so more readable/easier to maintain

class Masker {
    
    private static Pattern p = Pattern.compile("(?<token>fizz\\d =)(?<value>\\S )");
    public static String mask(String input) {

        StringBuilder sb = new StringBuilder();
        Matcher m = p.matcher(input);

        while(m.find()){
            String token = m.group("token");
            String value = m.group("value");
            String maskedValue = "X".repeat(value.length());
            m.appendReplacement(sb, token maskedValue);
        }
        m.appendTail(sb);

        return sb.toString();
    }


    //DEMO
    public static void main(String[] args) throws Exception {
        String input = "Some initial text fizz0=fj49jc49fj59 then some more fizz101023=jjj";
        String masked = Masker.mask(input);
        System.out.println(input);
        System.out.println(masked);
    }
}
  • Related