Home > front end >  Split String by | and numbers
Split String by | and numbers

Time:02-22

Let's imagine I have the following strings:

String one = "123|abc|123abc";
String two = "123|ab12c|abc|456|abc|def";
String three = "123|1abc|1abc1|456|abc|wer";
String four = "123|abc|def|456|ghi|jkl|789|mno|pqr";

If I do a split on them I expect the following output:

one = ["123|abc|123abc"];
two = ["123|ab12c|abc", "456|abc|def"];
three = ["123|1abc|1abc1", "456|abc|wer"];
four = ["123|abc|def", "456|ghi|jkl", "789|mno|pqr"];

The string has the following structure:

Starts with 1 or more digits followed by a random number of (| followed by random number of characters).

When after a | it's only numbers is considered a new value.

More examples:

In - 123456|xxxxxx|zzzzzzz|xa2314|xzxczxc|1234|qwerty
Out - ["123456|xxxxxx|zzzzzzz|xa2314|xzxczxc", "1234|qwerty"]

Tried multiple variations of the following but does not work:

value.split( "\\|\\d |\\d " )

CodePudding user response:

You may split on \|(?=\d (?:\||$)):

List<String> nums = Arrays.asList(new String[] {
    "123|abc|123abc",
    "123|ab12c|abc|456|abc|def",
    "123|1abc|1abc1|456|abc|wer",
    "123|abc|def|456|ghi|jkl|789|mno|pqr"
});

for (String num : nums) {
    String[] parts = num.split("\\|(?=\\d (?:\\||$))");
    System.out.println(num   " => "   Arrays.toString(parts));
}

This prints:

123|abc|123abc => [123|abc|123abc]
123|ab12c|abc|456|abc|def => [123|ab12c|abc, 456|abc|def]
123|1abc|1abc1|456|abc|wer => [123|1abc|1abc1, 456|abc|wer]
123|abc|def|456|ghi|jkl|789|mno|pqr => [123|abc|def, 456|ghi|jkl, 789|mno|pqr]

CodePudding user response:

Instead of splitting, you can match the parts in the string:

\b\d (?:\|(?!\d (?:$|\|))[^|\r\n] )*
  • \b A word boundary
  • \d Match 1 digits
  • (?: Non capture group
    • \|(?!\d (?:$|\|)) Match | and assert not only digits till either the next pipe or the end of the string
    • [^|\r\n] Match 1 chars other than a pipe or a newline
  • )* Close the non capture group and optionally repeat (use to repeat one or more times to match at least one pipe char)

Regex demo | Java demo

String regex = "\\b\\d (?:\\|(?!\\d (?:$|\\|))[^|\\r\\n] ) ";
String string = "123|abc|def|456|ghi|jkl|789|mno|pqr";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(string);
List<String> matches = new ArrayList<String>();

while (m.find()) 
    matches.add(m.group());

for (String s : matches)
    System.out.println(s);

Output

123|abc|def
456|ghi|jkl
789|mno|pqr
  • Related