Home > Mobile >  How To Match Repeating Sub-Patterns
How To Match Repeating Sub-Patterns

Time:09-07

Let's say I have a string:

String sentence = "My nieces are Cara:8 Sarah:9 Tara:10";

And I would like to find all their respective names and ages with the following pattern matcher:

String regex = "My\\s nieces\\s are((\\s (\\S ):(\\d ))*)";
Pattern pattern = Pattern.compile;
Matcher matcher = pattern.matcher(sentence);

I understand something like

matcher.find(0); // resets "pointer"
String niece = matcher.group(2);
String nieceName = matcher.group(3);
String nieceAge = matcher.group(4);

would give me my last niece (" Tara:10", "Tara", "10",).

How would I collect all of my nieces instead of only the last?

CodePudding user response:

You can't iterate over repeating groups, but you can match each group individually, calling find() in a loop to get the details of each one. If they need to be back-to-back, you can iteratively bound your matcher to the last index, like this:

Matcher matcher = Pattern.compile("My\\s nieces\\s are").matcher(sentence);
if (matcher.find()) {
    int boundary = matcher.end();
    
    matcher = Pattern.compile("^\\s (\\S ):(\\d )").matcher(sentence);
    while (matcher.region(boundary, sentence.length()).find()) {
        System.out.println(matcher.group());
        System.out.println(matcher.group(1));
        System.out.println(matcher.group(2));
        
        boundary = matcher.end();
    }
}

CodePudding user response:

Another idea is to use the \G anchor that matches where the previous match ended (or at start).

(?:\G(?!\A)|My\s nieces\s are)\s (\S ):(\d )
  • If My\s nieces\s are matches
  • \G will chain matches from there
  • (?!\A) neg. lookahead prevents \G from matching at \A start
  • \s (\S ):(\d ) using two capturing groups for extraction

See this demo at regex101 or a Java demo at tio.run (escape backslashes as a Java String)

  • Related