The string would be much longer, but I have shortened for sample purpose:
String s1= ("henry2O|*|bob10|*|mark20|*|justin30|*|kyle15|*|95|*|henry3O|*|bob50|*|mark70|*|justin30|*|kyle25|*|1000|");
Basically, this pattern keeps repeating itself and I would like to match with a regex only the tokens where henry, mark, kyle, appear (with their number included) or the tokens containing only numbers.
Essentially, my output should look like this:
(henry20, mark20, kyle15, 95, henry30, mark70, kyle25, 1000)
CodePudding user response:
You could use this regex:
(^(((henry|mark|kyle)\d (?=\|))|\d (?=\|))|((?<=\|)(henry|mark|kyle)\d (?=\|))|(?<=\|)\d (?=\|)|((?<=\|)(henry|mark|kyle)\d |(?<=\|)\d )$)
This regex can be basically broken down in three parts:
one checking that an occurrence appears at the beginning (start of the string and pipe following the token)
^(((henry|mark|kyle)\d (?=\|))|\d (?=\|))
one checking that an occurrence appears in between (a pipe character before and after the token).
(((?<=\|)(henry|mark|kyle)\d (?=\|))|(?<=\|)\d (?=\|))
one checking that an occurrence appears at the end of the string (a pipe character before the token followed by the end of the string).
((?<=\|)(henry|mark|kyle)\d |(?<=\|)\d )$
I know that in your example the last token shows a pipe after it, but since I've seen that the first one is not preceded by a pipe, I imagined that the string could also not finish with a pipe, so I've included this case too.
Each one of these three bits can be further broken down into:
A positive lookbehind that makes sure that every token is preceded by a pipe (second and third case).
A capturing group to either match an occurrence made of
henry
,mark
orkyle
followed by one or more digits or an occurrence made of only one or more digits.A positive lookahead that makes sure that every token is followed by a pipe character (first and second case).
Here is a link to test the regex:
https://regex101.com/r/WWEQZM/4
CodePudding user response:
I would suggest to use the split() method os string, i see it as more readable than a long regex.
The regex to split - (\|\*)*\|
. Means pipe followed by asterisk, zero or more times, followed by pipe.
Then you can check if the element is number, or contains henry, mike or kyle.
public class Temp {
public static void main(String[] args) throws Exception {
String s1 = ("henry2O|*|bob10|*|mark20|*|justin30|*|kyle15|*|95|*|henry3O|*|bob50|*|mark70|*|justin30|*|kyle25|*|1000|");
String[] data = s1.split("(\\|\\*)*\\|");
StringJoiner joiner = new StringJoiner(", ", "(", ")");
for (String string : data) {
if (string.matches("\\d ") || string.contains("henry") || string.contains("mark") || string.contains("kyle")) {
joiner.add(string);
}
}
System.out.println(joiner);
}
}
Another option would be to extract alphanumeric matches, since it looks like you need only alphanumeric stuff. \w
matches alphanumerics and underscore.
Pattern pattern = Pattern.compile("\\w ");
Matcher matcher = pattern.matcher(s1);
//StringJoiner, etc.
while (matcher.find()) {
String string = matcher.group();
if (string.matches("\\d ") || string.contains("henry") || string.contains("mark") || string.contains("kyle")) {
joiner.add(string);
}
}
//print