Home > Enterprise >  Split a sequence with operators and numbers on the operators or on the numbers
Split a sequence with operators and numbers on the operators or on the numbers

Time:10-05

So I have this regex used on java:

String eq = "5*i -6-7 -i*5"
String[] equation_numbers = eq.split("(?<=[^ */-])[ */-]");

and somehow it works and I'll have:

[5, i, -6, 7, -i, 5]

My questions are: Why does it work ? ( I really cant understand) How can I make a regex in the same style and take only the operators ?

For example:

String eq = "5*i -6-7 -i*5"
String[] equation_operators = eq.split(...);

and the result:

[*, , -, , *]

CodePudding user response:

Your expression (?<=[^ */-])[ */-] contains:

  • A positive lookbehind: (?<=[^ */-])
  • A character class: [ */-]

The positive lookbehind contains a negative character class

The negative character class is defined by the ^ in the beginning. And it will match everything that is not anyone of , *, /, -.

The positive lookbehind will just look at any preceding character. If there's a match, it will be set to true, and the expression will proceed. A lookbehind will only look and it will "match" while looking, but once the lookahead is done, the expression will not have moved on the string to be matched, the "eq". The marker on the string will remain exactly where it was when the lookbehind began.

(?<=[^ */-]) will look at the preceding character, and if and only if the preceding character is not any of these 4, the expression is allowed to move to the next part.

If we only had the lookbehind, it would be true on any of these places:

5*i -6-7 -i*5
 ^ ^  ^ ^  ^

everywhere else, there is either no character, or one of those 4 characters just prior.

The match

[ */-] matches only these 4 characters and it so happens that at every place where the lookbehind returns true, one of these characters can be matched. So the result of the expression on the entire string is the marked spots above.

The split

String.split(regex); will split the string on matches. So you'll get a String array where the markings have been removed and those exact ones have split the result:

5 i -6 7 -i 5

Splitting on the digits instead

In order to split on the digits, you'll need to match those instead of the operators. \w is short for [0-9A-Za-z_]. The - denote ranges, so this will match your i as well.

The only trick is to match a preceding minus, -, which you can do with ((?<!\\w)-)?. This is a negative lookbehind. It will check the preceding character and if it finds it, the result is false, and it will not be able to proceed. If it does not find a \w, the result is true, and it will be able to proceed, and it will proceed by matching a -. This will ensure that it will only attempt to match a - when there's another operator before the -. The ? after the entire expression makes it optional, so if the lookahead returns false, this entire thing is just skipped.

Now all that is needed is to match the word character after the optional -, with \w, and the expression becomes:

((?<!\\w)-)?\\w

where the escape character is escaped for use as a litteral into a Java String.

CodePudding user response:

Instead of split use matcher and collect all the results in a List using this regex:

(?<![ */-])[ */-]

This regex matches one of the [ */-] operators as long as these operators are not preceded by same operators.

RegEx Code

You may use this code in Java:

String eq = "5*i -6-7 -i*5";
String re = "(?<![ */-])[ */-]";

List<String> matches = Pattern.compile(re)
    .matcher(eq)
    .results()
    .map(MatchResult::group)
    .collect(Collectors.toList());
//=> [*,  , -,  , *]
  • Related