Let's say I have a string:
String sentence = "My nieces are Cara:8 Sarah:9 Tara:10";
And I would like to find all their respective names and ages with the following pattern matcher:
String regex = "My\\s nieces\\s are((\\s (\\S ):(\\d ))*)";
Pattern pattern = Pattern.compile;
Matcher matcher = pattern.matcher(sentence);
I understand something like
matcher.find(0); // resets "pointer"
String niece = matcher.group(2);
String nieceName = matcher.group(3);
String nieceAge = matcher.group(4);
would give me my last niece (" Tara:10"
, "Tara"
, "10"
,).
How would I collect all of my nieces instead of only the last?
CodePudding user response:
You can't iterate over repeating groups, but you can match each group individually, calling find()
in a loop to get the details of each one. If they need to be back-to-back, you can iteratively bound your matcher to the last index, like this:
Matcher matcher = Pattern.compile("My\\s nieces\\s are").matcher(sentence);
if (matcher.find()) {
int boundary = matcher.end();
matcher = Pattern.compile("^\\s (\\S ):(\\d )").matcher(sentence);
while (matcher.region(boundary, sentence.length()).find()) {
System.out.println(matcher.group());
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
boundary = matcher.end();
}
}
CodePudding user response:
Another idea is to use the \G
anchor that matches where the previous match ended (or at start).
(?:\G(?!\A)|My\s nieces\s are)\s (\S ):(\d )
- If
My\s nieces\s are
matches \G
will chain matches from there(?!\A)
neg. lookahead prevents\G
from matching at\A
start\s (\S ):(\d )
using two capturing groups for extraction
See this demo at regex101 or a Java demo at tio.run (escape backslashes as a Java String)