Home > Blockchain >  How I build Dart Regexp Properly?
How I build Dart Regexp Properly?

Time:05-10

Goal of this expression is separate mathematic calculations into operators, symbols, numbers and brackets.

For example:

Input string: 1 3-6*(12-3 4/5)

Output list: 1, , 3, -, 6, *, (12-3 4/5)

So I built this expression.

It is working on the web page, but in the Dart code this happens:

final calculationExpression = RegExp(
  r"/(\(([a-zA-Z0-9- /*] )\))|([a-zA-Z0-9] )|([ /*-]{1})/g",
  unicode: true,
  multiLine: true,
);

...

List<String> operators = calculationsString.split(calculationExpression); /// Output: ["", " ", "-", ...]

What did I do wrong?

CodePudding user response:

You put the JavaScript regexp literal slashes and flags inside the Dart string.

If you remove the leading / and trailing /g, you get the RegExp you intended to. The multiLine and unicode flags are unnecessary (your regexp doesn't use any feature affected by those)

The Dart split function does not emit capture groups, so you probably want to look at getting the matches, not removing them, which is what split does.

All in all, try:

final calculationExpression = RegExp(
    r"\([a-zA-Z\d\- /*] \)|[a-zA-Z\d] |[ /*\-]");
List<String> tokes = 
    calculationExpression.allMatches(calculationsString).toList();

CodePudding user response:

  1. The syntax /pattern/g is used to create regular expression literals in JavaScript (and sed and some other languages), just as quotes are used to create string literals. Dart doesn't have regular expression literals; you instead must invoke the RegExp constructor directly. Combining a regular expression literal syntax with an explicitly constructed RegExp object makes no sense. When you do RegExp(r'/pattern1|pattern2|pattern3/g'), you're actually matching against /pattern1 (pattern1 prefixed with a literal / character) or pattern2 or pattern3/g (pattern3 followed by a literal string /g).

  2. String.split does not split the input string such that each element of the result matches the pattern. It treats all matches of the pattern as separators. Consequently, the resulting list will not have any elements that match the pattern, which is the opposite of what you want. You instead want to find all matches of the pattern in the string, so you can use RegExp.allMatches if you also verify that the input string contains only matches from the regular expression.

Putting it all together:

void main() {
  final calculationExpression = RegExp(
    r"(\(([a-zA-Z0-9- /*] )\))|([a-zA-Z0-9] )|([ /*-]{1})",
    unicode: true,
    multiLine: true,
  );

  var calculationsString = '1 3-6*(12-3 4/5)';

  // Prints: [1,  , 3, -, 6, *, (12-3 4/5)]
  print(calculationsString.tokenizeFrom(calculationExpression).toList());
}

extension on String {
  Iterable<String> tokenizeFrom(RegExp regExp) sync* {
    void failIf(bool condition) {
      if (condition) {
        throw FormatException(
          '$this contains characters that do not match $regExp',
        );
      }
    }

    var matches = regExp.allMatches(this);
    var lastEnd = 0;
    for (var match in matches) {
      // Verify that there aren't unmatched characters.
      failIf(match.start != lastEnd);
      lastEnd = match.end;

      yield match.group(0)!;
    }

    failIf(lastEnd != length);
  }
}
  •  Tags:  
  • dart
  • Related