Home > front end >  Dart and perl does not yield the same result withh regex
Dart and perl does not yield the same result withh regex

Time:07-16

In port of a perl application to dart, I have to deal with regular expressions of the form below. The result of of the execution of both perl version and Dart version is included. The idea is simple replace basic patterns at the end of string. For me, the result that I get from perl fragment is correct. However the results from Dart version does not seem right. I would appreciate your help to understand where am I going wrong. Thanks in advance.

my $str ="this is a line of text ‖ ###";
print("\nIn 1 str=|$str|");
$str =~ s/###$/\n/g;
print("\nIn 2 str=|$str|");
$str =~ s/ ‖ $//g;
print("\nIn 3 str=|$str|");

output:

In 1 str=|this is a line of text ‖ ###|
In 2 str=|this is a line of text ‖ 
|
In 3 str=|this is a line of text
|

Dart code:

void main() {
var str;
str ="this is a line of text ‖ ###";
print("\nIn 1 str=|$str|");
str = str.replaceAll(RegExp(r'###$'), "\n");
print("\nIn 2 str=|$str|");
str = str.replaceAll(RegExp(r' ‖ $'), "");
print("\nIn 3 str=|$str|");
print("\n\n");
}

output:

In 1 str=|this is a line of text ‖ ###|

In 2 str=|this is a line of text ‖ 
|

In 3 str=|this is a line of text ‖ 
|

As you see:

 str = str.replaceAll(RegExp(r' ‖ $'), "");

does not replace the pattern ' ‖ $' with "" as opposed to its perl equivalent.

CodePudding user response:

In perl-dialect regular expressions, $ matches either at the end of the string or before a newline if it's the last character of the string (The rules are a bit different for multi-line mode, but you're not using that so we'll pretend it doesn't exist. \Z always has that same behavior, even in multi-line matches, so some people prefer using it instead of $ for consistency.)

So the RE /g$/ will match like

some great string\n
                ^

that is, at the g at the end before that last newline. There's also \z, which always matches at the actual end of the string. /g\z/ won't match in the above example because of the newline.

Dart-dialect regular expressions seem to have $ act like \z - so your second replacement wasn't matching because of the newline you added earlier. So if you use

    str = str.replaceAll(RegExp(r' ‖\s $'), "\n");

it will match as intended, and replace all that text with a trailing newline to match the perl version. Or strip off the trailing stuff and then append a newline instead of going the other way around.

CodePudding user response:

$ are not equivalent in both regex languages.

Dart uses the same regex language as JavaScript, and Reference - What does this regex mean? says the following:

  • In Perl regex, $ matches at a LF at the end of the string, and it matches at the very end of the string.

  • In JavaScript and Dart, $ matches at the very end of the string.

The rows of the following table identify equivalencies:

Perl ±m Perl -m Perl m JS ±m JS -m JS m
Very end of string \z (?![\s\S]) $
End of text \Z $ (?=\n?(?![\s\S])) (?=\n?$)
End of line (?=\n)|\z $ (?=\n)|(?![\s\S]) (?=\n)|$ $

(Multiline mode changes the meaning of $. "±m", "-m" and " m" respectively mean "whether in multiline mode or not", "outside of multiline mode" and "in multiline mode".)

So, to get Perl's behaviour in Dart, you can use (?=\n?$) (in general) or \s*$ (in this case) instead of $

JavaScript is great, but it really dropped the ball here.

  • Related