I am having some problem writing a method in Java. It basically extracts text with matching pattern and returns ALL the extractions. It simply works just like java.util.regex.Matcher
's find()
/matches()
then group()
:
Matcher matcher = pattern.matcher(fileContent);
StringBuilder sb = new StringBuilder();
while(matcher.matches()) {
sb.append(matcher.group()).append("\n");
}
return sb.toString();
However, I would like the extractions to be formatted with the references(dollar sign,$
) and literal-character-escaping (backslash,\
) support, just like the replacement in Matcher.replaceAll(replacement)
(Doc). For example:
fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
pattern = Pattern.compile("bb.*(.)(abb)");
extractionFormatter = "$1: $0, \\$$2";
The expected output would be:
a: bbcac aabb, $abb
b: bbccc babb, $abb
I hope you understand what I am trying to do. Do you know if there is any existing library/method that can achieve this without having me to reinvent the wheel?
CodePudding user response:
You can use the results
method from the Matcher
class which returns a stream of MatchResult
s to first get all matches, get the results as string using MatchResult.group
, replace now using the method String.replaceAll
using the pattern as regex and your extractionFormatter
as replacement and finally join all using new line:
String fileContent = "aaabbcac aabb\n"
"bcbcbbccc babba";
Pattern pattern = Pattern.compile("bb.*(.)(abb)");
String extractionFormatter = "$1: $0, \\$$2";
String output = pattern.matcher(fileContent)
.results()
.map(MatchResult::group)
.map(s -> s.replaceAll(pattern.pattern(), extractionFormatter))
.collect(Collectors.joining(System.lineSeparator()));
System.out.println(output);
CodePudding user response:
You can use String.replaceAll instead.
The thing to note is that if you want to get the desired output with capture groups, you would have to match (to remove) from the string that should not be there in the replacement.
Using a pattern that would give the desired output:
String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pattern = "(?m)^.*?(bb\\S*).*(.)(abb).*$";
String extractionFormatter = "$2: $1 $2$3, \\$$3";
System.out.print(fileContent.replaceAll(pattern, extractionFormatter));
Output
a: bbcac aabb, $abb
b: bbccc babb, $abb
See a Java demo.
Or using the Stringbuilder, Matcher and the while loop:
String fileContent = """
aaabbcac aabb
bcbcbbccc babba
""";
String pat = "bb.*(.)(abb)";
Pattern pattern = Pattern.compile(pat);
Matcher matcher = pattern.matcher(fileContent);
String extractionFormatter = "$1: $0, \\$$2";
StringBuilder sb = new StringBuilder();
while(matcher.find()) {
sb.append(matcher.group().replaceAll(pat, extractionFormatter)).append("\n");
}
System.out.print(sb);
See a Java demo.